1. 28 8月, 2020 7 次提交
  2. 27 8月, 2020 7 次提交
  3. 24 8月, 2020 3 次提交
  4. 22 8月, 2020 3 次提交
    • W
      KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set · b5331379
      Will Deacon 提交于
      When an MMU notifier call results in unmapping a range that spans multiple
      PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
      since this avoids running into RCU stalls during VM teardown. Unfortunately,
      if the VM is destroyed as a result of OOM, then blocking is not permitted
      and the call to the scheduler triggers the following BUG():
      
       | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
       | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
       | INFO: lockdep is turned off.
       | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
       | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
       | Call trace:
       |  dump_backtrace+0x0/0x284
       |  show_stack+0x1c/0x28
       |  dump_stack+0xf0/0x1a4
       |  ___might_sleep+0x2bc/0x2cc
       |  unmap_stage2_range+0x160/0x1ac
       |  kvm_unmap_hva_range+0x1a0/0x1c8
       |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
       |  __mmu_notifier_invalidate_range_start+0x218/0x31c
       |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
       |  __oom_reap_task_mm+0x128/0x268
       |  oom_reap_task+0xac/0x298
       |  oom_reaper+0x178/0x17c
       |  kthread+0x1e4/0x1fc
       |  ret_from_fork+0x10/0x30
      
      Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
      only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
      flags.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8b3405e3 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-3-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5331379
    • W
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon 提交于
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd
    • S
      ARM64: vdso32: Install vdso32 from vdso_install · 8d75785a
      Stephen Boyd 提交于
      Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
      vdso_install' installs the 32-bit compat vdso when it is compiled.
      
      Fixes: a7f71a2c ("arm64: compat: Add vDSO")
      Signed-off-by: NStephen Boyd <swboyd@chromium.org>
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Link: https://lore.kernel.org/r/20200818014950.42492-1-swboyd@chromium.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8d75785a
  5. 21 8月, 2020 9 次提交
  6. 20 8月, 2020 7 次提交
  7. 18 8月, 2020 4 次提交
    • M
      powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death · 801980f6
      Michael Roth 提交于
      For a power9 KVM guest with XIVE enabled, running a test loop
      where we hotplug 384 vcpus and then unplug them, the following traces
      can be seen (generally within a few loops) either from the unplugged
      vcpu:
      
        cpu 65 (hwid 65) Ready to die...
        Querying DEAD? cpu 66 (66) shows 2
        list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
        ------------[ cut here ]------------
        kernel BUG at lib/list_debug.c:56!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ...
        CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
        NIP:  c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
        REGS: c0000009e5a17840 TRAP: 0700   Not tainted  (4.18.0-221.el8.ppc64le)
        MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28000842  XER: 20040000
        ...
        NIP __list_del_entry_valid+0xac/0x100
        LR  __list_del_entry_valid+0xa8/0x100
        Call Trace:
          __list_del_entry_valid+0xa8/0x100 (unreliable)
          free_pcppages_bulk+0x1f8/0x940
          free_unref_page+0xd0/0x100
          xive_spapr_cleanup_queue+0x148/0x1b0
          xive_teardown_cpu+0x1bc/0x240
          pseries_mach_cpu_die+0x78/0x2f0
          cpu_die+0x48/0x70
          arch_cpu_idle_dead+0x20/0x40
          do_idle+0x2f4/0x4c0
          cpu_startup_entry+0x38/0x40
          start_secondary+0x7bc/0x8f0
          start_secondary_prolog+0x10/0x14
      
      or on the worker thread handling the unplug:
      
        pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
        Querying DEAD? cpu 314 (314) shows 2
        BUG: Bad page state in process kworker/u768:3  pfn:95de1
        cpu 314 (hwid 314) Ready to die...
        page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
        flags: 0x5ffffc00000000()
        raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000
        raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
        page dumped because: nonzero mapcount
        Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ...
        CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
        Workqueue: pseries hotplug workque pseries_hp_work_fn
        Call Trace:
          dump_stack+0xb0/0xf4 (unreliable)
          bad_page+0x12c/0x1b0
          free_pcppages_bulk+0x5bc/0x940
          page_alloc_cpu_dead+0x118/0x120
          cpuhp_invoke_callback.constprop.5+0xb8/0x760
          _cpu_down+0x188/0x340
          cpu_down+0x5c/0xa0
          cpu_subsys_offline+0x24/0x40
          device_offline+0xf0/0x130
          dlpar_offline_cpu+0x1c4/0x2a0
          dlpar_cpu_remove+0xb8/0x190
          dlpar_cpu_remove_by_index+0x12c/0x150
          dlpar_cpu+0x94/0x800
          pseries_hp_work_fn+0x128/0x1e0
          process_one_work+0x304/0x5d0
          worker_thread+0xcc/0x7a0
          kthread+0x1ac/0x1c0
          ret_from_kernel_thread+0x5c/0x80
      
      The latter trace is due to the following sequence:
      
        page_alloc_cpu_dead
          drain_pages
            drain_pages_zone
              free_pcppages_bulk
      
      where drain_pages() in this case is called under the assumption that
      the unplugged cpu is no longer executing. To ensure that is the case,
      and early call is made to __cpu_die()->pseries_cpu_die(), which runs a
      loop that waits for the cpu to reach a halted state by polling its
      status via query-cpu-stopped-state RTAS calls. It only polls for 25
      iterations before giving up, however, and in the trace above this
      results in the following being printed only .1 seconds after the
      hotplug worker thread begins processing the unplug request:
      
        pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
        Querying DEAD? cpu 314 (314) shows 2
      
      At that point the worker thread assumes the unplugged CPU is in some
      unknown/dead state and procedes with the cleanup, causing the race
      with the XIVE cleanup code executed by the unplugged CPU.
      
      Fix this by waiting indefinitely, but also making an effort to avoid
      spurious lockup messages by allowing for rescheduling after polling
      the CPU status and printing a warning if we wait for longer than 120s.
      
      Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Tested-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      [mpe: Trim oopses in change log slightly for readability]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com
      801980f6
    • C
      powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined · 7bee31ad
      Christophe Leroy 提交于
      When MODULES_VADDR is defined, is_module_segment() shall check the
      address against it instead of checking agains VMALLOC_START.
      
      Fixes: 6ca05532 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/07884ed033c31e074747b7eb8eaa329d15db07ec.1596641219.git.christophe.leroy@csgroup.eu
      7bee31ad
    • C
      powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32 · 48d2f040
      Christophe Leroy 提交于
      On BOOK3S_32, when we have modules and strict kernel RWX, modules
      are not in vmalloc space but in a dedicated segment that is
      below PAGE_OFFSET.
      
      So KASAN_SHADOW_START must take it into account.
      
      MODULES_VADDR can't be used because it is not defined yet
      in kasan.h
      
      Fixes: 6ca05532 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/6eddca2d5611fd57312a88eae31278c87a8fc99d.1596641224.git.christophe.leroy@csgroup.eu
      48d2f040
    • J
      kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode · cb957adb
      Jim Mattson 提交于
      See the SDM, volume 3, section 4.4.1:
      
      If PAE paging would be in use following an execution of MOV to CR0 or
      MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
      CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
      the PDPTEs are loaded from the address in CR3.
      
      Fixes: b9baba86 ("KVM, pkeys: expose CPUID/CR4 to guest")
      Cc: Huaitong Han <huaitong.han@intel.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20200817181655.3716509-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb957adb