1. 27 2月, 2019 1 次提交
    • P
      KVM: PPC: Fix compilation when KVM is not enabled · e74d53e3
      Paul Mackerras 提交于
      Compiling with CONFIG_PPC_POWERNV=y and KVM disabled currently gives
      an error like this:
      
        CC      arch/powerpc/kernel/dbell.o
      In file included from arch/powerpc/kernel/dbell.c:20:0:
      arch/powerpc/include/asm/kvm_ppc.h: In function ‘xics_on_xive’:
      arch/powerpc/include/asm/kvm_ppc.h:625:9: error: implicit declaration of function ‘xive_enabled’ [-Werror=implicit-function-declaration]
        return xive_enabled() && cpu_has_feature(CPU_FTR_HVMODE);
               ^
      cc1: all warnings being treated as errors
      scripts/Makefile.build:276: recipe for target 'arch/powerpc/kernel/dbell.o' failed
      make[3]: *** [arch/powerpc/kernel/dbell.o] Error 1
      
      Fix this by making the xics_on_xive() definition conditional on the
      same symbol (CONFIG_KVM_BOOK3S_64_HANDLER) that determines whether we
      include <asm/xive.h> or not, since that's the header that defines
      xive_enabled().
      
      Fixes: 03f95332 ("KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e74d53e3
  2. 22 2月, 2019 3 次提交
    • M
      powerpc/kvm: Save and restore host AMR/IAMR/UAMOR · c3c7470c
      Michael Ellerman 提交于
      When the hash MMU is active the AMR, IAMR and UAMOR are used for
      pkeys. The AMR is directly writable by user space, and the UAMOR masks
      those writes, meaning both registers are effectively user register
      state. The IAMR is used to create an execute only key.
      
      Also we must maintain the value of at least the AMR when running in
      process context, so that any memory accesses done by the kernel on
      behalf of the process are correctly controlled by the AMR.
      
      Although we are correctly switching all registers when going into a
      guest, on returning to the host we just write 0 into all regs, except
      on Power9 where we restore the IAMR correctly.
      
      This could be observed by a user process if it writes the AMR, then
      runs a guest and we then return immediately to it without
      rescheduling. Because we have written 0 to the AMR that would have the
      effect of granting read/write permission to pages that the process was
      trying to protect.
      
      In addition, when using the Radix MMU, the AMR can prevent inadvertent
      kernel access to userspace data, writing 0 to the AMR disables that
      protection.
      
      So save and restore AMR, IAMR and UAMOR.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      c3c7470c
    • A
      KVM: PPC: Book3S: Improve KVM reference counting · 716cb116
      Alexey Kardashevskiy 提交于
      The anon fd's ops releases the KVM reference in the release hook.
      However we reference the KVM object after we create the fd so there is
      small window when the release function can be called and
      dereferenced the KVM object which potentially may free it.
      
      It is not a problem at the moment as the file is created and KVM is
      referenced under the KVM lock and the release function obtains the same
      lock before dereferencing the KVM (although the lock is not held when
      calling kvm_put_kvm()) but it is potentially fragile against future changes.
      
      This references the KVM object before creating a file.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      716cb116
    • J
      KVM: PPC: Book3S HV: Fix build failure without IOMMU support · e40542af
      Jordan Niethe 提交于
      Currently trying to build without IOMMU support will fail:
      
        (.text+0x1380): undefined reference to `kvmppc_h_get_tce'
        (.text+0x1384): undefined reference to `kvmppc_rm_h_put_tce'
        (.text+0x149c): undefined reference to `kvmppc_rm_h_stuff_tce'
        (.text+0x14a0): undefined reference to `kvmppc_rm_h_put_tce_indirect'
      
      This happens because turning off IOMMU support will prevent
      book3s_64_vio_hv.c from being built because it is only built when
      SPAPR_TCE_IOMMU is set, which depends on IOMMU support.
      
      Fix it using ifdefs for the undefined references.
      
      Fixes: 76d837a4 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms")
      Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e40542af
  3. 21 2月, 2019 7 次提交
    • P
      powerpc/64s: Better printing of machine check info for guest MCEs · c0577201
      Paul Mackerras 提交于
      This adds an "in_guest" parameter to machine_check_print_event_info()
      so that we can avoid trying to translate guest NIP values into
      symbolic form using the host kernel's symbol table.
      Reviewed-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c0577201
    • P
      KVM: PPC: Book3S HV: Simplify machine check handling · 884dfb72
      Paul Mackerras 提交于
      This makes the handling of machine check interrupts that occur inside
      a guest simpler and more robust, with less done in assembler code and
      in real mode.
      
      Now, when a machine check occurs inside a guest, we always get the
      machine check event struct and put a copy in the vcpu struct for the
      vcpu where the machine check occurred.  We no longer call
      machine_check_queue_event() from kvmppc_realmode_mc_power7(), because
      on POWER8, when a vcpu is running on an offline secondary thread and
      we call machine_check_queue_event(), that calls irq_work_queue(),
      which doesn't work because the CPU is offline, but instead triggers
      the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which
      fires again and again because nothing clears the condition).
      
      All that machine_check_queue_event() actually does is to cause the
      event to be printed to the console.  For a machine check occurring in
      the guest, we now print the event in kvmppc_handle_exit_hv()
      instead.
      
      The assembly code at label machine_check_realmode now just calls C
      code and then continues exiting the guest.  We no longer either
      synthesize a machine check for the guest in assembly code or return
      to the guest without a machine check.
      
      The code in kvmppc_handle_exit_hv() is extended to handle the case
      where the guest is not FWNMI-capable.  In that case we now always
      synthesize a machine check interrupt for the guest.  Previously, if
      the host thinks it has recovered the machine check fully, it would
      return to the guest without any notification that the machine check
      had occurred.  If the machine check was caused by some action of the
      guest (such as creating duplicate SLB entries), it is much better to
      tell the guest that it has caused a problem.  Therefore we now always
      generate a machine check interrupt for guests that are not
      FWNMI-capable.
      Reviewed-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      884dfb72
    • M
      KVM: PPC: Book3S HV: Context switch AMR on Power9 · d976f680
      Michael Ellerman 提交于
      kvmhv_p9_guest_entry() implements a fast-path guest entry for Power9
      when guest and host are both running with the Radix MMU.
      
      Currently in that path we don't save the host AMR (Authority Mask
      Register) value, and we always restore 0 on return to the host. That
      is OK at the moment because the AMR is not used for storage keys with
      the Radix MMU.
      
      However we plan to start using the AMR on Radix to prevent the kernel
      from reading/writing to userspace outside of copy_to/from_user(). In
      order to make that work we need to save/restore the AMR value.
      
      We only restore the value if it is different from the guest value,
      which is already in the register when we exit to the host. This should
      mean we rarely need to actually restore the value when running a
      modern Linux as a guest, because it will be using the same value as
      us.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NRussell Currey <ruscur@russell.cc>
      d976f680
    • N
      KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start · dee339b5
      Nir Weiner 提交于
      grow_halt_poll_ns() have a strange behaviour in case
      (vcpu->halt_poll_ns != 0) &&
      (vcpu->halt_poll_ns < halt_poll_ns_grow_start).
      
      In this case, vcpu->halt_poll_ns will be multiplied by grow factor
      (halt_poll_ns_grow) which will require several grow iteration in order
      to reach a value bigger than halt_poll_ns_grow_start.
      This means that growing vcpu->halt_poll_ns from value of 0 is slower
      than growing it from a positive value less than halt_poll_ns_grow_start.
      Which is misleading and inaccurate.
      
      Fix issue by changing grow_halt_poll_ns() to set vcpu->halt_poll_ns
      to halt_poll_ns_grow_start in any case that
      (vcpu->halt_poll_ns < halt_poll_ns_grow_start).
      Regardless if vcpu->halt_poll_ns is 0.
      
      use READ_ONCE to get a consistent number for all cases.
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NNir Weiner <nir.weiner@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dee339b5
    • N
      KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter · 49113d36
      Nir Weiner 提交于
      The hard-coded value 10000 in grow_halt_poll_ns() stands for the initial
      start value when raising up vcpu->halt_poll_ns.
      It actually sets the first timeout to the first polling session.
      This value has significant effect on how tolerant we are to outliers.
      On the standard case, higher value is better - we will spend more time
      in the polling busyloop, handle events/interrupts faster and result
      in better performance.
      But on outliers it puts us in a busy loop that does nothing.
      Even if the shrink factor is zero, we will still waste time on the first
      iteration.
      The optimal value changes between different workloads. It depends on
      outliers rate and polling sessions length.
      As this value has significant effect on the dynamic halt-polling
      algorithm, it should be configurable and exposed.
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NNir Weiner <nir.weiner@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      49113d36
    • N
      KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns · 7fa08e71
      Nir Weiner 提交于
      grow_halt_poll_ns() have a strange behavior in case
      (halt_poll_ns_grow == 0) && (vcpu->halt_poll_ns != 0).
      
      In this case, vcpu->halt_pol_ns will be set to zero.
      That results in shrinking instead of growing.
      
      Fix issue by changing grow_halt_poll_ns() to not modify
      vcpu->halt_poll_ns in case halt_poll_ns_grow is zero
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NNir Weiner <nir.weiner@oracle.com>
      Suggested-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7fa08e71
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 15248258
      Sean Christopherson 提交于
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15248258
  4. 19 2月, 2019 7 次提交
    • S
      KVM: PPC: Book3S HV: Add KVM stat largepages_[2M/1G] · 8f1f7b9b
      Suraj Jitindar Singh 提交于
      This adds an entry to the kvm_stats_debugfs directory which provides the
      number of large (2M or 1G) pages which have been used to setup the guest
      mappings, for radix guests.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8f1f7b9b
    • A
      KVM: PPC: Release all hardware TCE tables attached to a group · a67614cc
      Alexey Kardashevskiy 提交于
      The SPAPR TCE KVM device references all hardware IOMMU tables assigned to
      some IOMMU group to ensure that in-kernel KVM acceleration of H_PUT_TCE
      can work. The tables are references when an IOMMU group gets registered
      with the VFIO KVM device by the KVM_DEV_VFIO_GROUP_ADD ioctl;
      KVM_DEV_VFIO_GROUP_DEL calls into the dereferencing code
      in kvm_spapr_tce_release_iommu_group() which walks through the list of
      LIOBNs, finds a matching IOMMU table and calls kref_put() when found.
      
      However that code stops after the very first successful derefencing
      leaving other tables referenced till the SPAPR TCE KVM device is destroyed
      which normally happens on guest reboot or termination so if we do hotplug
      and unplug in a loop, we are leaking IOMMU tables here.
      
      This removes a premature return to let kvm_spapr_tce_release_iommu_group()
      find and dereference all attached tables.
      
      Fixes: 121f80ba ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      a67614cc
    • S
      KVM: PPC: Book3S HV: Optimise mmio emulation for devices on FAST_MMIO_BUS · 1b642257
      Suraj Jitindar Singh 提交于
      Devices on the KVM_FAST_MMIO_BUS by definition have length zero and are
      thus used for notification purposes rather than data transfer. For
      example eventfd for virtio devices.
      
      This means that when emulating mmio instructions which target devices on
      this bus we can immediately handle them and return without needing to load
      the instruction from guest memory.
      
      For now we restrict this to stores as this is the only use case at
      present.
      
      For a normal guest the effect is negligible, however for a nested guest
      we save on the order of 5us per access.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1b642257
    • P
      KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE · 03f95332
      Paul Mackerras 提交于
      Currently, the KVM code assumes that if the host kernel is using the
      XIVE interrupt controller (the new interrupt controller that first
      appeared in POWER9 systems), then the in-kernel XICS emulation will
      use the XIVE hardware to deliver interrupts to the guest.  However,
      this only works when the host is running in hypervisor mode and has
      full access to all of the XIVE functionality.  It doesn't work in any
      nested virtualization scenario, either with PR KVM or nested-HV KVM,
      because the XICS-on-XIVE code calls directly into the native-XIVE
      routines, which are not initialized and cannot function correctly
      because they use OPAL calls, and OPAL is not available in a guest.
      
      This means that using the in-kernel XICS emulation in a nested
      hypervisor that is using XIVE as its interrupt controller will cause a
      (nested) host kernel crash.  To fix this, we change most of the places
      where the current code calls xive_enabled() to select between the
      XICS-on-XIVE emulation and the plain XICS emulation to call a new
      function, xics_on_xive(), which returns false in a guest.
      
      However, there is a further twist.  The plain XICS emulation has some
      functions which are used in real mode and access the underlying XICS
      controller (the interrupt controller of the host) directly.  In the
      case of a nested hypervisor, this means doing XICS hypercalls
      directly.  When the nested host is using XIVE as its interrupt
      controller, these hypercalls will fail.  Therefore this also adds
      checks in the places where the XICS emulation wants to access the
      underlying interrupt controller directly, and if that is XIVE, makes
      the code use the virtual mode fallback paths, which call generic
      kernel infrastructure rather than doing direct XICS access.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      03f95332
    • M
      KVM: PPC: Remove -I. header search paths · f1adb9c4
      Masahiro Yamada 提交于
      The header search path -I. in kernel Makefiles is very suspicious;
      it allows the compiler to search for headers in the top of $(srctree),
      where obviously no header file exists.
      
      Commit 46f43c6e ("KVM: powerpc: convert marker probes to event
      trace") first added these options, but they are completely useless.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f1adb9c4
    • W
      KVM: PPC: Book3S HV: Replace kmalloc_node+memset with kzalloc_node · 08434ab4
      wangbo 提交于
      Replace kmalloc_node and memset with kzalloc_node
      Signed-off-by: Nwangbo <wang.bo116@zte.com.cn>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      08434ab4
    • P
      KVM: PPC: Book3S PR: Add emulation for slbfee. instruction · 41a8645a
      Paul Mackerras 提交于
      Recent kernels, since commit e15a4fea ("powerpc/64s/hash: Add
      some SLB debugging tests", 2018-10-03) use the slbfee. instruction,
      which PR KVM currently does not have code to emulate.  Consequently
      recent kernels fail to boot under PR KVM.  This adds emulation of
      slbfee., enabling these kernels to boot successfully.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      41a8645a
  5. 01 2月, 2019 1 次提交
    • O
      powerpc/papr_scm: Use the correct bind address · 5a3840a4
      Oliver O'Halloran 提交于
      When binding an SCM volume to a physical address the hypervisor has the
      option to return early with a continue token with the expectation that
      the guest will resume the bind operation until it completes. A quirk of
      this interface is that the bind address will only be returned by the
      first bind h-call and the subsequent calls will return
      0xFFFF_FFFF_FFFF_FFFF for the bind address.
      
      We currently do not save the address returned by the first h-call. As a
      result we will use the junk address as the base of the bound region if
      the hypervisor decides to split the bind across multiple h-calls. This
      bug was found when testing with very large SCM volumes where the bind
      process would take more time than they hypervisor's internal h-call time
      limit would allow. This patch fixes the issue by saving the bind address
      from the first call.
      
      Cc: stable@vger.kernel.org
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5a3840a4
  6. 31 1月, 2019 1 次提交
    • A
      powerpc/radix: Fix kernel crash with mremap() · 579b9239
      Aneesh Kumar K.V 提交于
      With support for split pmd lock, we use pmd page pmd_huge_pte pointer
      to store the deposited page table. In those config when we move page
      tables we need to make sure we move the deposited page table to the
      correct pmd page. Otherwise this can result in crash when we withdraw
      of deposited page table because we can find the pmd_huge_pte NULL.
      
      eg:
      
        __split_huge_pmd+0x1070/0x1940
        __split_huge_pmd+0xe34/0x1940 (unreliable)
        vma_adjust_trans_huge+0x110/0x1c0
        __vma_adjust+0x2b4/0x9b0
        __split_vma+0x1b8/0x280
        __do_munmap+0x13c/0x550
        sys_mremap+0x220/0x7e0
        system_call+0x5c/0x70
      
      Fixes: 675d9952 ("powerpc/book3s64: Enable split pmd ptlock.")
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      579b9239
  7. 20 1月, 2019 1 次提交
  8. 15 1月, 2019 2 次提交
    • M
      powerpc/syscalls: Fix syscall tracing · 7bea7ac0
      Michael Ellerman 提交于
      Recently in commit fbf508da ("powerpc: split compat syscall table
      out from native table") we changed the layout of the system call
      table. Instead of having two entries for each syscall number, one for
      the regular entry point and one for the compat entry point, we now
      have separate tables for regular and compat entry points.
      
      This inadvertently broke syscall tracing (CONFIG_FTRACE_SYSCALLS),
      because our implementation of arch_syscall_addr() knew about the
      layout of the table (it did nr * 2).
      
      We can fix it just by dropping our version of arch_syscall_addr() and
      using the generic version which does:
      
      	return (unsigned long)sys_call_table[nr];
      
      Fixes: fbf508da ("powerpc: split compat syscall table out from native table")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7bea7ac0
    • J
      powerpc/pseries: Fix build break due to pnv_npu2_init() · da727097
      Jason A. Donenfeld 提交于
      Commit 3be2df00 ("powerpc/pseries/npu: Enable platform support")
      added a call to pnv_npu2_init() in pseries code. This causes a build
      break if we build with CONFIG_PPC_PSERIES && !CONFIG_PPC_POWERNV:
      
        powerpc64le-pc-linux-gnu-ld: arch/powerpc/platforms/pseries/pci.o: in function `pSeries_final_fixup':
        pci.c:(.init.text+0x1b0): undefined reference to `pnv_npu2_init'
      
      This commit therefore wraps that line in an ifdef, so that pseries
      builds without powernv.
      
      Fixes: 3be2df00 ("powerpc/pseries/npu: Enable platform support")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [mpe: Frob change log a bit to blame a different commit]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da727097
  9. 11 1月, 2019 5 次提交
  10. 08 1月, 2019 2 次提交
  11. 06 1月, 2019 3 次提交
  12. 05 1月, 2019 2 次提交
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
    • L
      Fix access_ok() fallout for sparc32 and powerpc · 4caf4ebf
      Linus Torvalds 提交于
      These two architectures actually had an intentional use of the 'type'
      argument to access_ok() just to avoid warnings.
      
      I had actually noticed the powerpc one, but forgot to then fix it up.
      And I missed the sparc32 case entirely.
      
      This is hopefully all of it.
      Reported-by: NMathieu Malaterre <malat@debian.org>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Fixes: 96d4f267 ("Remove 'type' argument from access_ok() function")
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4caf4ebf
  13. 04 1月, 2019 2 次提交
    • M
      powerpc: Drop use of 'type' from access_ok() · 074400a7
      Mathieu Malaterre 提交于
      In commit 05a4ab82 ("powerpc/uaccess: fix warning/error with
      access_ok()") an attempt was made to remove a warning by referencing
      the variable `type`. However in commit 96d4f267 ("Remove 'type'
      argument from access_ok() function") the variable `type` has been
      removed, breaking the build:
      
        arch/powerpc/include/asm/uaccess.h:66:32: error: ‘type’ undeclared (first use in this function)
      
      This essentially reverts commit 05a4ab82 ("powerpc/uaccess: fix
      warning/error with access_ok()") to fix the error.
      
      Fixes: 96d4f267 ("Remove 'type' argument from access_ok() function")
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      [mpe: Reword change log slightly.]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      074400a7
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  14. 30 12月, 2018 3 次提交
    • C
      kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops · cc028297
      Christophe Leroy 提交于
      checkpatch.pl reports the following:
      
        WARNING: struct kgdb_arch should normally be const
        #28: FILE: arch/mips/kernel/kgdb.c:397:
        +struct kgdb_arch arch_kgdb_ops = {
      
      This report makes sense, as all other ops struct, this
      one should also be const. This patch does the change.
      
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: x86@kernel.org
      Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Acked-by: NPaul Burton <paul.burton@mips.com>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      cc028297
    • D
      kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function() · 3cd99ac3
      Douglas Anderson 提交于
      When I had lockdep turned on and dropped into kgdb I got a nice splat
      on my system.  Specifically it hit:
        DEBUG_LOCKS_WARN_ON(current->hardirq_context)
      
      Specifically it looked like this:
        sysrq: SysRq : DEBUG
        ------------[ cut here ]------------
        DEBUG_LOCKS_WARN_ON(current->hardirq_context)
        WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
        pstate: 604003c9 (nZCv DAIF +PAN -UAO)
        pc : lockdep_hardirqs_on+0xf0/0x160
        ...
        Call trace:
         lockdep_hardirqs_on+0xf0/0x160
         trace_hardirqs_on+0x188/0x1ac
         kgdb_roundup_cpus+0x14/0x3c
         kgdb_cpu_enter+0x53c/0x5cc
         kgdb_handle_exception+0x180/0x1d4
         kgdb_compiled_brk_fn+0x30/0x3c
         brk_handler+0x134/0x178
         do_debug_exception+0xfc/0x178
         el1_dbg+0x18/0x78
         kgdb_breakpoint+0x34/0x58
         sysrq_handle_dbg+0x54/0x5c
         __handle_sysrq+0x114/0x21c
         handle_sysrq+0x30/0x3c
         qcom_geni_serial_isr+0x2dc/0x30c
        ...
        ...
        irq event stamp: ...45
        hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
        hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
        softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
        softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
        ---[ end trace adf21f830c46e638 ]---
      
      Looking closely at it, it seems like a really bad idea to be calling
      local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
      like it could violate spinlock semantics and cause a deadlock.
      
      Instead, let's use a private csd alongside
      smp_call_function_single_async() to round up the other CPUs.  Using
      smp_call_function_single_async() doesn't require interrupts to be
      enabled so we can remove the offending bit of code.
      
      In order to avoid duplicating this across all the architectures that
      use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
      to debug_core.c.
      
      Looking at all the people who previously had copies of this code,
      there were a few variants.  I've attempted to keep the variants
      working like they used to.  Specifically:
      * For arch/arc we passed NULL to kgdb_nmicallback() instead of
        get_irq_regs().
      * For arch/mips there was a bit of extra code around
        kgdb_nmicallback()
      
      NOTE: In this patch we will still get into trouble if we try to round
      up a CPU that failed to round up before.  We'll try to round it up
      again and potentially hang when we try to grab the csd lock.  That's
      not new behavior but we'll still try to do better in a future patch.
      Suggested-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      3cd99ac3
    • D
      kgdb: Remove irq flags from roundup · 9ef7fa50
      Douglas Anderson 提交于
      The function kgdb_roundup_cpus() was passed a parameter that was
      documented as:
      
      > the flags that will be used when restoring the interrupts. There is
      > local_irq_save() call before kgdb_roundup_cpus().
      
      Nobody used those flags.  Anyone who wanted to temporarily turn on
      interrupts just did local_irq_enable() and local_irq_disable() without
      looking at them.  So we can definitely remove the flags.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      9ef7fa50