1. 24 7月, 2019 1 次提交
  2. 13 7月, 2019 4 次提交
  3. 04 7月, 2019 1 次提交
  4. 01 7月, 2019 1 次提交
  5. 28 6月, 2019 1 次提交
    • C
      arch: wire-up pidfd_open() · 7615d9e1
      Christian Brauner 提交于
      This wires up the pidfd_open() syscall into all arches at once.
      Signed-off-by: NChristian Brauner <christian@brauner.io>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jann Horn <jannh@google.com>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-api@vger.kernel.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux-arch@vger.kernel.org
      Cc: x86@kernel.org
      7615d9e1
  6. 25 6月, 2019 1 次提交
    • N
      powerpc/64s/exception: Fix machine check early corrupting AMR · e13e7cd4
      Nicholas Piggin 提交于
      The early machine check runs in real mode, so locking is unnecessary.
      Worse, the windup does not restore AMR, so this can result in a false
      KUAP fault after a recoverable machine check hits inside a user copy
      operation.
      
      Fix this similarly to HMI by just avoiding the kuap lock in the
      early machine check handler (it will be set by the late handler that
      runs in virtual mode if that runs). If the virtual mode handler is
      reached, it will lock and restore the AMR.
      
      Fixes: 890274c2 ("powerpc/64s: Implement KUAP for Radix MMU")
      Cc: Russell Currey <ruscur@russell.cc>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e13e7cd4
  7. 24 6月, 2019 1 次提交
    • S
      bus_find_device: Unify the match callback with class_find_device · 418e3ea1
      Suzuki K Poulose 提交于
      There is an arbitrary difference between the prototypes of
      bus_find_device() and class_find_device() preventing their callers
      from passing the same pair of data and match() arguments to both of
      them, which is the const qualifier used in the prototype of
      class_find_device().  If that qualifier is also used in the
      bus_find_device() prototype, it will be possible to pass the same
      match() callback function to both bus_find_device() and
      class_find_device(), which will allow some optimizations to be made in
      order to avoid code duplication going forward.  Also with that, constify
      the "data" parameter as it is passed as a const to the match function.
      
      For this reason, change the prototype of bus_find_device() to match
      the prototype of class_find_device() and adjust its callers to use the
      const qualifier in accordance with the new prototype of it.
      
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Andreas Noever <andreas.noever@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David Kershner <david.kershner@unisys.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Felipe Balbi <balbi@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: Harald Freudenberger <freude@linux.ibm.com>
      Cc: Hartmut Knaack <knaack.h@gmx.de>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Jamet <michael.jamet@intel.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
      Cc: Sebastian Ott <sebott@linux.ibm.com>
      Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Cc: Yehezkel Bernat <YehezkelShB@gmail.com>
      Cc: rafael@kernel.org
      Acked-by: NCorey Minyard <minyard@acm.org>
      Acked-by: NDavid Kershner <david.kershner@unisys.com>
      Acked-by: NMark Brown <broonie@kernel.org>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Acked-by: Wolfram Sang <wsa@the-dreams.de> # for the I2C parts
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      418e3ea1
  8. 23 6月, 2019 1 次提交
  9. 20 6月, 2019 1 次提交
    • S
      KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries · 50087112
      Suraj Jitindar Singh 提交于
      When a guest vcpu moves from one physical thread to another it is
      necessary for the host to perform a tlb flush on the previous core if
      another vcpu from the same guest is going to run there. This is because the
      guest may use the local form of the tlb invalidation instruction meaning
      stale tlb entries would persist where it previously ran. This is handled
      on guest entry in kvmppc_check_need_tlb_flush() which calls
      flush_guest_tlb() to perform the tlb flush.
      
      Previously the generic radix__local_flush_tlb_lpid_guest() function was
      used, however the functionality was reimplemented in flush_guest_tlb()
      to avoid the trace_tlbie() call as the flushing may be done in real
      mode. The reimplementation in flush_guest_tlb() was missing an erat
      invalidation after flushing the tlb.
      
      This lead to observable memory corruption in the guest due to the
      caching of stale translations. Fix this by adding the erat invalidation.
      
      Fixes: 70ea13f6 ("KVM: PPC: Book3S HV: Flush TLB on secondary radix threads")
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      50087112
  10. 19 6月, 2019 4 次提交
  11. 18 6月, 2019 2 次提交
    • S
      KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode · 84b02824
      Suraj Jitindar Singh 提交于
      The hcall H_SET_DAWR is used by a guest to set the data address
      watchpoint register (DAWR). This hcall is handled in the host in
      kvmppc_h_set_dawr() which can be called in either real mode on the
      guest exit path from hcall_try_real_mode() in book3s_hv_rmhandlers.S,
      or in virtual mode when called from kvmppc_pseries_do_hcall() in
      book3s_hv.c.
      
      The function kvmppc_h_set_dawr() updates the dawr and dawrx fields in
      the vcpu struct accordingly and then also writes the respective values
      into the DAWR and DAWRX registers directly. It is necessary to write
      the registers directly here when calling the function in real mode
      since the path to re-enter the guest won't do this. However when in
      virtual mode the host DAWR and DAWRX values have already been
      restored, and so writing the registers would overwrite these.
      Additionally there is no reason to write the guest values here as
      these will be read from the vcpu struct and written to the registers
      appropriately the next time the vcpu is run.
      
      This also avoids the case when handling h_set_dawr for a nested guest
      where the guest hypervisor isn't able to write the DAWR and DAWRX
      registers directly and must rely on the real hypervisor to do this for
      it when it calls H_ENTER_NESTED.
      
      Fixes: c1fe190c ("powerpc: Add force enable of DAWR on P9 option")
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      84b02824
    • M
      KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr() · fabb2efc
      Michael Neuling 提交于
      Commit c1fe190c ("powerpc: Add force enable of DAWR on P9 option")
      screwed up some assembler and corrupted a pointer in r3. This resulted
      in crashes like the below:
      
        BUG: Kernel NULL pointer dereference at 0x000013bf
        Faulting instruction address: 0xc00000000010b044
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3
        NIP:  c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4
        REGS: c00000179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
        MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42244842  XER: 00000000
        CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0
        GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff
        GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0
        ...
        NIP kvmppc_h_set_dabr+0x50/0x68
        LR  kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv]
        Call Trace:
          0xc0000017f11c0000 (unreliable)
          kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
          kvmppc_vcpu_run+0x34/0x48 [kvm]
          kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
          kvm_vcpu_ioctl+0x460/0x850 [kvm]
          do_vfs_ioctl+0xe4/0xb40
          ksys_ioctl+0xc4/0x110
          sys_ioctl+0x28/0x80
          system_call+0x5c/0x70
        Instruction dump:
        4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff
        4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8 7c942ba6 7cbc2ba6
      
      Fix the bug by only changing r3 when we are returning immediately.
      
      Fixes: c1fe190c ("powerpc: Add force enable of DAWR on P9 option")
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reported-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fabb2efc
  12. 16 6月, 2019 1 次提交
  13. 15 6月, 2019 6 次提交
  14. 14 6月, 2019 2 次提交
  15. 12 6月, 2019 1 次提交
    • M
      powerpc/mm/64s/hash: Reallocate context ids on fork · ca72d883
      Michael Ellerman 提交于
      When using the Hash Page Table (HPT) MMU, userspace memory mappings
      are managed at two levels. Firstly in the Linux page tables, much like
      other architectures, and secondly in the SLB (Segment Lookaside
      Buffer) and HPT. It's the SLB and HPT that are actually used by the
      hardware to do translations.
      
      As part of the series adding support for 4PB user virtual address
      space using the hash MMU, we added support for allocating multiple
      "context ids" per process, one for each 512TB chunk of address space.
      These are tracked in an array called extended_id in the mm_context_t
      of a process that has done a mapping above 512TB.
      
      If such a process forks (ie. clone(2) without CLONE_VM set) it's mm is
      copied, including the mm_context_t, and then init_new_context() is
      called to reinitialise parts of the mm_context_t as appropriate to
      separate the address spaces of the two processes.
      
      The key step in ensuring the two processes have separate address
      spaces is to allocate a new context id for the process, this is done
      at the beginning of hash__init_new_context(). If we didn't allocate a
      new context id then the two processes would share mappings as far as
      the SLB and HPT are concerned, even though their Linux page tables
      would be separate.
      
      For mappings above 512TB, which use the extended_id array, we
      neglected to allocate new context ids on fork, meaning the parent and
      child use the same ids and therefore share those mappings even though
      they're supposed to be separate. This can lead to the parent seeing
      writes done by the child, which is essentially memory corruption.
      
      There is an additional exposure which is that if the child process
      exits, all its context ids are freed, including the context ids that
      are still in use by the parent for mappings above 512TB. One or more
      of those ids can then be reallocated to a third process, that process
      can then read/write to the parent's mappings above 512TB. Additionally
      if the freed id is used for the third process's primary context id,
      then the parent is able to read/write to the third process's mappings
      *below* 512TB.
      
      All of these are fundamental failures to enforce separation between
      processes. The only mitigating factor is that the bug only occurs if a
      process creates mappings above 512TB, and most applications still do
      not create such mappings.
      
      Only machines using the hash page table MMU are affected, eg. PowerPC
      970 (G5), PA6T, Power5/6/7/8/9. By default Power9 bare metal machines
      (powernv) use the Radix MMU and are not affected, unless the machine
      has been explicitly booted in HPT mode (using disable_radix on the
      kernel command line). KVM guests on Power9 may be affected if the host
      or guest is configured to use the HPT MMU. LPARs under PowerVM on
      Power9 are affected as they always use the HPT MMU. Kernels built with
      PAGE_SIZE=4K are not affected.
      
      The fix is relatively simple, we need to reallocate context ids for
      all extended mappings on fork.
      
      Fixes: f384796c ("powerpc/mm: Add support for handling > 512TB address in SLB miss")
      Cc: stable@vger.kernel.org # v4.17+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca72d883
  16. 09 6月, 2019 1 次提交
  17. 07 6月, 2019 4 次提交
    • C
      powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX · c21f5a9e
      Christophe Leroy 提交于
      When booting through OF, setup_disp_bat() does nothing because
      disp_BAT are not set. By change, it used to work because BOOTX
      buffer is mapped 1:1 at address 0x81000000 by the bootloader, and
      btext_setup_display() sets virt addr same as phys addr.
      
      But since commit 215b8237 ("powerpc/32s: set up an early static
      hash table for KASAN."), a temporary page table overrides the
      bootloader mapping.
      
      This 0x81000000 is also problematic with the newly implemented
      Kernel Userspace Access Protection (KUAP) because it is within user
      address space.
      
      This patch fixes those issues by properly setting disp_BAT through
      a call to btext_prepare_BAT(), allowing setup_disp_bat() to
      properly setup BAT3 for early bootx screen buffer access.
      Reported-by: NMathieu Malaterre <malat@debian.org>
      Fixes: 215b8237 ("powerpc/32s: set up an early static hash table for KASAN.")
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c21f5a9e
    • N
      powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate() · a00196a2
      Nicholas Piggin 提交于
      The change to pmdp_invalidate() to mark the pmd with _PAGE_INVALID
      broke the synchronisation against lock free lookups,
      __find_linux_pte()'s pmd_none() check no longer returns true for such
      cases.
      
      Fix this by adding a check for this condition as well.
      
      Fixes: da7ad366 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
      Cc: stable@vger.kernel.org # v4.20+
      Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a00196a2
    • N
      powerpc/64s: Fix THP PMD collapse serialisation · 33258a1d
      Nicholas Piggin 提交于
      Commit 1b2443a5 ("powerpc/book3s64: Avoid multiple endian
      conversion in pte helpers") changed the actual bitwise tests in
      pte_access_permitted by using pte_write() and pte_present() helpers
      rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits.
      
      The pte_present() change now returns true for PTEs which are
      !_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by
      pmdp_invalidate() to synchronize access from lock-free lookups.
      pte_access_permitted() is used by pmd_access_permitted(), so allowing
      GUP lock free access to proceed with such PTEs breaks this
      synchronisation.
      
      This bug has been observed on a host using the hash page table MMU,
      with random crashes and corruption in guests, usually together with
      bad PMD messages in the host.
      
      Fix this by adding an explicit check in pmd_access_permitted(), and
      documenting the condition explicitly.
      
      The pte_write() change should be okay, and would prevent GUP from
      falling back to the slow path when encountering savedwrite PTEs, which
      matches what x86 (that does not implement savedwrite) does.
      
      Fixes: 1b2443a5 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      33258a1d
    • C
      powerpc: Fix kexec failure on book3s/32 · 6c284228
      Christophe Leroy 提交于
      In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
      Therefore, allthough __mapin_ram_chunk() was already mapping kernel
      text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
      memory was executable. Part of the memory (first 512kbytes) was
      mapped with BATs instead of page table, but it was also entirely
      mapped as executable.
      
      In commit 385e89d5 ("powerpc/mm: add exec protection on
      powerpc 603"), we started adding exec protection to some 6xx, namely
      the 603, for pages mapped via pagetables.
      
      Then, in commit 63b2bc61 ("powerpc/mm/32s: Use BATs for
      STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
      memory, so that really only the kernel text could be executed.
      
      The problem here is that kexec is based on copying some code into
      upper part of memory then executing it from there in order to install
      a fresh new kernel at its definitive location.
      
      However, the code is position independant and first part of it is
      just there to deactivate the MMU and jump to the second part. So it
      is possible to run this first part inplace instead of running the
      copy. Once the MMU is off, there is no protection anymore and the
      second part of the code will just run as before.
      Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Fixes: 63b2bc61 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6c284228
  18. 06 6月, 2019 1 次提交
  19. 05 6月, 2019 6 次提交