1. 28 11月, 2017 6 次提交
    • J
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c
      Jan H. Schönherr 提交于
      KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
      "any unblocked signal received [...] will cause KVM_RUN to return with
      -EINTR" and that "the signal will only be delivered if not blocked by
      the original signal mask".
      
      This, however, is only true, when the calling task has a signal handler
      registered for a signal. If not, signal evaluation is short-circuited for
      SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
      returning or the whole process is terminated.
      
      Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
      to that in do_sigtimedwait() to avoid short-circuiting of signals.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20b7035c
    • W
      KVM: VMX: Fix vmx->nested freeing when no SMI handler · b7455825
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         ------------[ cut here ]------------
         WARNING: CPU: 5 PID: 2939 at arch/x86/kvm/vmx.c:3844 free_loaded_vmcs+0x77/0x80 [kvm_intel]
         CPU: 5 PID: 2939 Comm: repro Not tainted 4.14.0+ #26
         RIP: 0010:free_loaded_vmcs+0x77/0x80 [kvm_intel]
         Call Trace:
          vmx_free_vcpu+0xda/0x130 [kvm_intel]
          kvm_arch_destroy_vm+0x192/0x290 [kvm]
          kvm_put_kvm+0x262/0x560 [kvm]
          kvm_vm_release+0x2c/0x30 [kvm]
          __fput+0x190/0x370
          task_work_run+0xa1/0xd0
          do_exit+0x4d2/0x13e0
          do_group_exit+0x89/0x140
          get_signal+0x318/0xb80
          do_signal+0x8c/0xb40
          exit_to_usermode_loop+0xe4/0x140
          syscall_return_slowpath+0x206/0x230
          entry_SYSCALL_64_fastpath+0x98/0x9a
      
      The syzkaller testcase will execute VMXON/VMLAUCH instructions, so the
      vmx->nested stuff is populated, it will also issue KVM_SMI ioctl. However,
      the testcase is just a simple c program and not be lauched by something
      like seabios which implements smi_handler. Commit 05cade71 (KVM: nSVM:
      fix SMI injection in guest mode) gets out of guest mode and set nested.vmxon
      to false for the duration of SMM according to SDM 34.14.1 "leave VMX
      operation" upon entering SMM. We can't alloc/free the vmx->nested stuff
      each time when entering/exiting SMM since it will induce more overhead. So
      the function vmx_pre_enter_smm() marks nested.vmxon false even if vmx->nested
      stuff is still populated. What it expected is em_rsm() can mark nested.vmxon
      to be true again. However, the smi_handler/rsm will not execute since there
      is no something like seabios in this scenario. The function free_nested()
      fails to free the vmx->nested stuff since the vmx->nested.vmxon is false
      which results in the above warning.
      
      This patch fixes it by also considering the no SMI handler case, luckily
      vmx->nested.smm.vmxon is marked according to the value of vmx->nested.vmxon
      in vmx_pre_enter_smm(), we can take advantage of it and free vmx->nested
      stuff when L1 goes down.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Fixes: 05cade71 (KVM: nSVM: fix SMI injection in guest mode)
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b7455825
    • W
      KVM: VMX: Fix rflags cache during vCPU reset · c37c2873
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         *** Guest State ***
         CR0: actual=0x0000000080010031, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
         CR4: actual=0x0000000000002061, shadow=0x0000000000000000, gh_mask=ffffffffffffe8f1
         CR3 = 0x000000002081e000
         RSP = 0x000000000000fffa  RIP = 0x0000000000000000
         RFLAGS=0x00023000         DR7 = 0x00000000000000
                ^^^^^^^^^^
         ------------[ cut here ]------------
         WARNING: CPU: 6 PID: 24431 at /home/kernel/linux/arch/x86/kvm//x86.c:7302 kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
         CPU: 6 PID: 24431 Comm: reprotest Tainted: G        W  OE   4.14.0+ #26
         RIP: 0010:kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
         RSP: 0018:ffff880291d179e0 EFLAGS: 00010202
         Call Trace:
          kvm_vcpu_ioctl+0x479/0x880 [kvm]
          do_vfs_ioctl+0x142/0x9a0
          SyS_ioctl+0x74/0x80
          entry_SYSCALL_64_fastpath+0x23/0x9a
      
      The failed vmentry is triggered by the following beautified testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[5];
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
              struct kvm_guest_debug debug = {
                      .control = 0xf0403,
                      .arch = {
                              .debugreg[6] = 0x2,
                              .debugreg[7] = 0x2
                      }
              };
              ioctl(r[4], KVM_SET_GUEST_DEBUG, &debug);
              ioctl(r[4], KVM_RUN, 0);
          }
      
      which testcase tries to setup the processor specific debug
      registers and configure vCPU for handling guest debug events through
      KVM_SET_GUEST_DEBUG.  The KVM_SET_GUEST_DEBUG ioctl will get and set
      rflags in order to set TF bit if single step is needed. All regs' caches
      are reset to avail and GUEST_RFLAGS vmcs field is reset to 0x2 during vCPU
      reset. However, the cache of rflags is not reset during vCPU reset. The
      function vmx_get_rflags() returns an unreset rflags cache value since
      the cache is marked avail, it is 0 after boot. Vmentry fails if the
      rflags reserved bit 1 is 0.
      
      This patch fixes it by resetting both the GUEST_RFLAGS vmcs field and
      its cache to 0x2 during vCPU reset.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c37c2873
    • W
      KVM: X86: Fix softlockup when get the current kvmclock · e70b57a6
      Wanpeng Li 提交于
       watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
       CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
       RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
       Call Trace:
        get_time_ref_counter+0x5a/0x80 [kvm]
        kvm_hv_process_stimers+0x120/0x5f0 [kvm]
        kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
        kvm_vcpu_ioctl+0x33a/0x620 [kvm]
        do_vfs_ioctl+0xa1/0x5d0
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x1e/0xa9
      
      This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
      cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
      (set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
      in kvm_get_time_scale() gets into an infinite loop.
      
      This patch fixes it by treating the unhotplug pCPU as not using master clock.
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e70b57a6
    • D
      KVM: lapic: Fixup LDR on load in x2apic · 12806ba9
      Dr. David Alan Gilbert 提交于
      In x2apic mode the LDR is fixed based on the ID rather
      than separately loadable like it was before x2.
      When kvm_apic_set_state is called, the base is set, and if
      it has the X2APIC_ENABLE flag set then the LDR is calculated;
      however that value gets overwritten by the memcpy a few lines
      below overwriting it with the value that came from userland.
      
      The symptom is a lack of EOI after loading the state
      (e.g. after a QEMU migration) and is due to the EOI bitmap
      being wrong due to the incorrect LDR.  This was seen with
      a Win2016 guest under Qemu with irqchip=split whose USB mouse
      didn't work after a VM migration.
      
      This corresponds to RH bug:
        https://bugzilla.redhat.com/show_bug.cgi?id=1502591Reported-by: NYiqian Wei <yiwei@redhat.com>
      Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: stable@vger.kernel.org
      [Applied fixup from Liran Alon. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      12806ba9
    • D
      KVM: lapic: Split out x2apic ldr calculation · e872fa94
      Dr. David Alan Gilbert 提交于
      Split out the ldr calculation from kvm_apic_set_x2apic_id
      since we're about to reuse it in the following patch.
      Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e872fa94
  2. 17 11月, 2017 34 次提交
    • P
      KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP · c4ad77e0
      Paolo Bonzini 提交于
      These bits were not defined until now in common code, but they are
      now that the kernel supports UMIP too.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c4ad77e0
    • J
      KVM: x86: Fix CPUID function for word 6 (80000001_ECX) · 50a671d4
      Janakarajan Natarajan 提交于
      The function for CPUID 80000001 ECX is set to 0xc0000001. Set it to
      0x80000001.
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Fixes: d6321d49 ("KVM: x86: generalize guest_cpuid_has_ helpers")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      50a671d4
    • L
      KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 · 917dc606
      Liran Alon 提交于
      vmx_check_nested_events() should return -EBUSY only in case there is a
      pending L1 event which requires a VMExit from L2 to L1 but such a
      VMExit is currently blocked. Such VMExits are blocked either
      because nested_run_pending=1 or an event was reinjected to L2.
      vmx_check_nested_events() should return 0 in case there are no
      pending L1 events which requires a VMExit from L2 to L1 or if
      a VMExit from L2 to L1 was done internally.
      
      However, upstream commit which introduced blocking in case an event was
      reinjected to L2 (commit acc9ab60 ("KVM: nVMX: Fix pending events
      injection")) contains a bug: It returns -EBUSY even if there are no
      pending L1 events which requires VMExit from L2 to L1.
      
      This commit fix this issue.
      
      Fixes: acc9ab60 ("KVM: nVMX: Fix pending events injection")
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      917dc606
    • N
      KVM: x86: ioapic: Preserve read-only values in the redirection table · b200dded
      Nikita Leshenko 提交于
      According to 82093AA (IOAPIC) manual, Remote IRR and Delivery Status are
      read-only. QEMU implements the bits as RO in commit 479c2a1cb7fb
      ("ioapic: keep RO bits for IOAPIC entry").
      Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NSteve Rutherford <srutherford@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b200dded
    • N
      KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered · a8bfec29
      Nikita Leshenko 提交于
      Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
      IOAPICs without an EOI register. They simulate the EOI message manually
      by changing the trigger mode to edge and then back to level, with the
      entry being masked during this.
      
      QEMU implements this feature in commit ed1263c363c9
      ("ioapic: clear remote irr bit for edge-triggered interrupts")
      
      As a side effect, this commit removes an incorrect behavior where Remote
      IRR was cleared when the redirection table entry was rewritten. This is not
      consistent with the manual and also opens an opportunity for a strange
      behavior when a redirection table entry is modified from an interrupt
      handler that handles the same entry: The modification will clear the
      Remote IRR bit even though the interrupt handler is still running.
      Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NSteve Rutherford <srutherford@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a8bfec29
    • N
      KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq · 7d225368
      Nikita Leshenko 提交于
      Remote IRR for level-triggered interrupts was previously checked in
      ioapic_set_irq, but since we now have a check in ioapic_service we
      can remove the redundant check from ioapic_set_irq.
      
      This commit doesn't change semantics.
      Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      7d225368
    • N
      KVM: x86: ioapic: Don't fire level irq when Remote IRR set · da3fe7bd
      Nikita Leshenko 提交于
      Avoid firing a level-triggered interrupt that has the Remote IRR bit set,
      because that means that some CPU is already processing it. The Remote
      IRR bit will be cleared after an EOI and the interrupt will refire
      if the irq line is still asserted.
      
      This behavior is aligned with QEMU's IOAPIC implementation that was
      introduced by commit f99b86b94987
      ("x86: ioapic: ignore level irq during processing") in QEMU.
      Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      da3fe7bd
    • N
      KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race · 0fc5a36d
      Nikita Leshenko 提交于
      KVM uses ioapic_handled_vectors to track vectors that need to notify the
      IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
      interrupt with old configuration is pending or running and
      ioapic_handled_vectors only remembers the newest configuration;
      thus EOI from the old interrupt is not delievered to the IOAPIC.
      
      A previous commit db2bdcbb
      ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
      addressed this issue by adding pending edge-triggered interrupts to
      ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
      The commit explicitly ignored level-triggered interrupts,
      but this race applies to them as well:
      
      1) IOAPIC sends a level triggered interrupt vector to VCPU0
      2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
         to route the vector to VCPU1. The reconfiguration rewrites only the
         upper 32 bits of the IOREDTBLn register. (Causes KVM to update
         ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
      3) VCPU0 sends EOI for the vector, but it's not delievered to the
         IOAPIC because the ioapic_handled_vectors doesn't include the vector.
      4) New interrupts are not delievered to VCPU1 because remote_irr bit
         is set forever.
      
      Therefore, the correct behavior is to add all pending and running
      interrupts to ioapic_handled_vectors.
      
      This commit introduces a slight performance hit similar to
      commit db2bdcbb ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
      for the rare case that the vector is reused by a non-IOAPIC source on
      VCPU0. We prefer to keep solution simple and not handle this case just
      as the original commit does.
      
      Fixes: db2bdcbb ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
      Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      0fc5a36d
    • P
      KVM: x86: inject exceptions produced by x86_decode_insn · 6ea6e843
      Paolo Bonzini 提交于
      Sometimes, a processor might execute an instruction while another
      processor is updating the page tables for that instruction's code page,
      but before the TLB shootdown completes.  The interesting case happens
      if the page is in the TLB.
      
      In general, the processor will succeed in executing the instruction and
      nothing bad happens.  However, what if the instruction is an MMIO access?
      If *that* happens, KVM invokes the emulator, and the emulator gets the
      updated page tables.  If the update side had marked the code page as non
      present, the page table walk then will fail and so will x86_decode_insn.
      
      Unfortunately, even though kvm_fetch_guest_virt is correctly returning
      X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as
      a fatal error if the instruction cannot simply be reexecuted (as is the
      case for MMIO).  And this in fact happened sometimes when rebooting
      Windows 2012r2 guests.  Just checking ctxt->have_exception and injecting
      the exception if true is enough to fix the case.
      
      Thanks to Eduardo Habkost for helping in the debugging of this issue.
      Reported-by: NYanan Fu <yfu@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6ea6e843
    • E
      KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs · fab0aa3b
      Eyal Moscovici 提交于
      Some guests use these unhandled MSRs very frequently.
      This cause dmesg to be populated with lots of aggregated messages on
      usage of ignored MSRs. As ignore_msrs=true means that the user is
      well-aware his guest use ignored MSRs, allow to also disable the
      prints on their usage.
      
      An example of such guest is ESXi which tends to access a lot to MSR
      0x34 (MSR_SMI_COUNT) very frequently.
      
      In addition, we have observed this to cause unnecessary delays to
      guest execution. Such an example is ESXi which experience networking
      delays in it's guests (L2 guests) because of these prints (even when
      prints are rate-limited). This can easily be reproduced by pinging
      from one L2 guest to another.  Once in a while, a peak in ping RTT
      will be observed. Removing these unhandled MSR prints solves the
      issue.
      
      Because these prints can help diagnose issues with guests,
      this commit only suppress them by a module parameter instead of
      removing them from code entirely.
      Signed-off-by: NEyal Moscovici <eyal.moscovici@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Changed suppress_ignore_msrs_prints to report_ignored_msrs - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      fab0aa3b
    • D
      KVM: x86: fix em_fxstor() sleeping while in atomic · 4d772cb8
      David Hildenbrand 提交于
      Commit 9d643f63 ("KVM: x86: avoid large stack allocations in
      em_fxrstor") optimize the stack size, but introduced a guest memory access
      which might sleep while in atomic.
      
      Fix it by introducing, again, a second fxregs_state. Try to avoid
      large stacks by using noinline. Add some helpful comments.
      
      Reported by syzbot:
      
      in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
      2 locks held by syzkaller879109/2909:
        #0:  (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] vcpu_load+0x1c/0x70
      arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
        #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
      arch/x86/kvm/x86.c:6983 [inline]
        #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
      arch/x86/kvm/x86.c:7061 [inline]
        #1:  (&kvm->srcu){....}, at: [<ffffffff810dd162>]
      kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
      CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:16 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:52
        ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
        __might_sleep+0x95/0x190 kernel/sched/core.c:5967
        __might_fault+0xab/0x1d0 mm/memory.c:4383
        __copy_from_user include/linux/uaccess.h:71 [inline]
        __kvm_read_guest_page+0x58/0xa0
      arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
        kvm_vcpu_read_guest_page+0x44/0x60
      arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
        kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
        kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
        segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
        em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
        x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
        x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
        kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
        handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
        vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
        vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
        vcpu_run arch/x86/kvm/x86.c:7061 [inline]
        kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
        kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
        vfs_ioctl fs/ioctl.c:45 [inline]
        do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
        SYSC_ioctl fs/ioctl.c:700 [inline]
        SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x437fc9
      RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9
      RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000
      R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000
      R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000
      
      Fixes: 9d643f63 ("KVM: x86: avoid large stack allocations in em_fxrstor")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      4d772cb8
    • W
      KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure · 5af41573
      Wanpeng Li 提交于
      Commit 4f350c6d (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure
      properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1)
      null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1.
      
      In L1:
      
      BUG: unable to handle kernel paging request at ffffffffc015bf8f
       IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
       PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
       Oops: 0003 [#1] PREEMPT SMP
       CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
       RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
       Call Trace:
       WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002
      
      In L0:
      
      -----------[ cut here ]------------
       WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
       CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc7+ #25
       RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
       Call Trace:
        paging64_page_fault+0x500/0xde0 [kvm]
        ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
        ? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
        ? __asan_storeN+0x12/0x20
        ? paging64_gva_to_gpa+0xb0/0x120 [kvm]
        ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
        ? lock_acquire+0x2c0/0x2c0
        ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
        ? vmx_get_segment+0x2a6/0x310 [kvm_intel]
        ? sched_clock+0x1f/0x30
        ? check_chain_key+0x137/0x1e0
        ? __lock_acquire+0x83c/0x2420
        ? kvm_multiple_exception+0xf2/0x220 [kvm]
        ? debug_check_no_locks_freed+0x240/0x240
        ? debug_smp_processor_id+0x17/0x20
        ? __lock_is_held+0x9e/0x100
        kvm_mmu_page_fault+0x90/0x180 [kvm]
        kvm_handle_page_fault+0x15c/0x310 [kvm]
        ? __lock_is_held+0x9e/0x100
        handle_exception+0x3c7/0x4d0 [kvm_intel]
        vmx_handle_exit+0x103/0x1010 [kvm_intel]
        ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]
      
      The commit avoids to load host state of vmcs12 as vmcs01's guest state
      since vmcs12 is not modified (except for the VM-instruction error field)
      if the checking of vmcs control area fails. However, the mmu context is
      switched to nested mmu in prepare_vmcs02() and it will not be reloaded
      since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME
      fails. This patch fixes it by reloading mmu context when nested
      VMLAUNCH/VMRESUME fails.
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      5af41573
    • W
      KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry · f1b026a3
      Wanpeng Li 提交于
      According to the SDM, if the "load IA32_BNDCFGS" VM-entry controls is 1, the
      following checks are performed on the field for the IA32_BNDCFGS MSR:
       - Bits reserved in the IA32_BNDCFGS MSR must be 0.
       - The linear address in bits 63:12 must be canonical.
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f1b026a3
    • W
      KVM: X86: Fix operand/address-size during instruction decoding · 3853be26
      Wanpeng Li 提交于
      Pedro reported:
        During tests that we conducted on KVM, we noticed that executing a "PUSH %ES"
        instruction under KVM produces different results on both memory and the SP
        register depending on whether EPT support is enabled. With EPT the SP is
        reduced by 4 bytes (and the written value is 0-padded) but without EPT support
        it is only reduced by 2 bytes. The difference can be observed when the CS.DB
        field is 1 (32-bit) but not when it's 0 (16-bit).
      
      The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D
      also should be respected instead of just default operand/address-size/66H
      prefix/67H prefix during instruction decoding. This patch fixes it by also
      adjusting operand/address-size according to CS.D.
      Reported-by: NPedro Fonseca <pfonseca@cs.washington.edu>
      Tested-by: NPedro Fonseca <pfonseca@cs.washington.edu>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Pedro Fonseca <pfonseca@cs.washington.edu>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      3853be26
    • L
      KVM: x86: Don't re-execute instruction when not passing CR2 value · 9b8ae637
      Liran Alon 提交于
      In case of instruction-decode failure or emulation failure,
      x86_emulate_instruction() will call reexecute_instruction() which will
      attempt to use the cr2 value passed to x86_emulate_instruction().
      However, when x86_emulate_instruction() is called from
      emulate_instruction(), cr2 is not passed (passed as 0) and therefore
      it doesn't make sense to execute reexecute_instruction() logic at all.
      
      Fixes: 51d8b661 ("KVM: cleanup emulate_instruction")
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      9b8ae637
    • L
      KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure · 1f4dcb3b
      Liran Alon 提交于
      On this case, handle_emulation_failure() fills kvm_run with
      internal-error information which it expects to be delivered
      to user-mode for further processing.
      However, the code reports a wrong return-value which makes KVM to never
      return to user-mode on this scenario.
      
      Fixes: 6d77dbfc ("KVM: inject #UD if instruction emulation fails and exit to
      userspace")
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      1f4dcb3b
    • L
      KVM: x86: Exit to user-mode on #UD intercept when emulator requires · 61cb57c9
      Liran Alon 提交于
      Instruction emulation after trapping a #UD exception can result in an
      MMIO access, for example when emulating a MOVBE on a processor that
      doesn't support the instruction.  In this case, the #UD vmexit handler
      must exit to user mode, but there wasn't any code to do so.  Add it for
      both VMX and SVM.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      61cb57c9
    • L
      KVM: nVMX/nSVM: Don't intercept #UD when running L2 · ac9b305c
      Liran Alon 提交于
      When running L2, #UD should be intercepted by L1 or just forwarded
      directly to L2. It should not reach L0 x86 emulator.
      Therefore, set intercept for #UD only based on L1 exception-bitmap.
      
      Also add WARN_ON_ONCE() on L0 #UD intercept handlers to make sure
      it is never reached while running L2.
      
      This improves commit ae1f5767 ("KVM: nVMX: Do not emulate #UD while
      in guest mode") by removing an unnecessary exit from L2 to L0 on #UD
      when L1 doesn't intercept it.
      
      In addition, SVM L0 #UD intercept handler doesn't handle correctly the
      case it is raised from L2. In this case, it should forward the #UD to
      guest instead of x86 emulator. As done in VMX #UD intercept handler.
      This commit fixes this issue as-well.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ac9b305c
    • L
      KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk · 51c4b8bb
      Liran Alon 提交于
      When guest passes KVM it's pvclock-page GPA via WRMSR to
      MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize
      pvclock-page to some start-values. It just requests a clock-update which
      will happen before entering to guest.
      
      The clock-update logic will call kvm_setup_pvclock_page() to update the
      pvclock-page with info. However, kvm_setup_pvclock_page() *wrongly*
      assumes that the version-field is initialized to an even number. This is
      wrong because at first-time write, field could be any-value.
      
      Fix simply makes sure that if first-time version-field is odd, increment
      it once more to make it even and only then start standard logic.
      This follows same logic as done in other pvclock shared-pages (See
      kvm_write_wall_clock() and record_steal_time()).
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      51c4b8bb
    • P
      kvm: vmx: Allow disabling virtual NMI support · d02fcf50
      Paolo Bonzini 提交于
      To simplify testing of these rarely used code paths, add a module parameter
      that turns it on.  One eventinj.flat test (NMI after iret) fails when
      loading kvm_intel with vnmi=0.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d02fcf50
    • P
      kvm: vmx: Reinstate support for CPUs without virtual NMI · 8a1b4392
      Paolo Bonzini 提交于
      This is more or less a revert of commit 2c82878b ("KVM: VMX: require
      virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines
      only had virtual NMIs in some SKUs.
      
      The revert is not trivial because in the meanwhile there have been several
      fixes to nested NMI injection.  Therefore, the entire vNMI state is moved
      to struct loaded_vmcs.
      
      Another change compared to before the patch is a simplification here:
      
             if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
                 !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
                                             get_vmcs12(vcpu))))) {
      
      The final condition here is always true (because nested_cpu_has_virtual_nmis
      is always false) and is removed.
      
      Fixes: 2c82878b
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      8a1b4392
    • P
      KVM: SVM: obey guest PAT · 15038e14
      Paolo Bonzini 提交于
      For many years some users of assigned devices have reported worse
      performance on AMD processors with NPT than on AMD without NPT,
      Intel or bare metal.
      
      The reason turned out to be that SVM is discarding the guest PAT
      setting and uses the default (PA0=PA4=WB, PA1=PA5=WT, PA2=PA6=UC-,
      PA3=UC).  The guest might be using a different setting, and
      especially might want write combining but isn't getting it
      (instead getting slow UC or UC- accesses).
      
      Thanks a lot to geoff@hostfission.com for noticing the relation
      to the g_pat setting.  The patch has been tested also by a bunch
      of people on VFIO users forums.
      
      Fixes: 709ddebf
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196409
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Tested-by: NNick Sarnie <commendsarnex@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      15038e14
    • P
      Merge tag 'kvm-arm-gicv4-for-v4.15' of... · fc3790fa
      Paolo Bonzini 提交于
      Merge tag 'kvm-arm-gicv4-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      GICv4 Support for KVM/ARM for v4.15
      fc3790fa
    • L
      Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · cf9b0772
      Linus Torvalds 提交于
      Pull ARM SoC driver updates from Arnd Bergmann:
       "This branch contains platform-related driver updates for ARM and
        ARM64, these are the areas that bring the changes:
      
        New drivers:
      
         - driver support for Renesas R-Car V3M (R8A77970)
      
         - power management support for Amlogic GX
      
         - a new driver for the Tegra BPMP thermal sensor
      
         - a new bus driver for Technologic Systems NBUS
      
        Changes for subsystems that prefer to merge through arm-soc:
      
         - the usual updates for reset controller drivers from Philipp Zabel,
           with five added drivers for SoCs in the arc, meson, socfpa,
           uniphier and mediatek families
      
         - updates to the ARM SCPI and PSCI frameworks, from Sudeep Holla,
           Heiner Kallweit and Lorenzo Pieralisi
      
        Changes specific to some ARM-based SoC
      
         - the Freescale/NXP DPAA QBMan drivers from PowerPC can now work on
           ARM as well
      
         - several changes for power management on Broadcom SoCs
      
         - various improvements on Qualcomm, Broadcom, Amlogic, Atmel,
           Mediatek
      
         - minor Cleanups for Samsung, TI OMAP SoCs"
      
      [ NOTE! This doesn't work without the previous ARM SoC device-tree pull,
        because the R8A77970 driver is missing a header file that came from
        that pull.
      
        The fact that this got merged afterwards only fixes it at this point,
        and bisection of that driver will fail if/when you walk into the
        history of that driver.           - Linus ]
      
      * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (96 commits)
        soc: amlogic: meson-gx-pwrc-vpu: fix power-off when powered by bootloader
        bus: add driver for the Technologic Systems NBUS
        memory: omap-gpmc: Remove deprecated gpmc_update_nand_reg()
        soc: qcom: remove unused label
        soc: amlogic: gx pm domain: add PM and OF dependencies
        drivers/firmware: psci_checker: Add missing destroy_timer_on_stack()
        dt-bindings: power: add amlogic meson power domain bindings
        soc: amlogic: add Meson GX VPU Domains driver
        soc: qcom: Remote filesystem memory driver
        dt-binding: soc: qcom: Add binding for rmtfs memory
        of: reserved_mem: Accessor for acquiring reserved_mem
        of/platform: Generalize /reserved-memory handling
        soc: mediatek: pwrap: fix fatal compiler error
        soc: mediatek: pwrap: fix compiler errors
        arm64: mediatek: cleanup message for platform selection
        soc: Allow test-building of MediaTek drivers
        soc: mediatek: place Kconfig for all SoC drivers under menu
        soc: mediatek: pwrap: add support for MT7622 SoC
        soc: mediatek: pwrap: add common way for setup CS timing extenstion
        soc: mediatek: pwrap: add MediaTek MT6380 as one slave of pwrap
        ..
      cf9b0772
    • L
      Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 527d1470
      Linus Torvalds 提交于
      Pull ARM device-tree updates from Arnd Bergmann:
       "We add device tree files for a couple of additional SoCs in various
        areas:
      
        Allwinner R40/V40 for entertainment, Broadcom Hurricane 2 for
        networking, Amlogic A113D for audio, and Renesas R-Car V3M for
        automotive.
      
        As usual, lots of new boards get added based on those and other SoCs:
      
         - Actions S500 based CubieBoard6 single-board computer
      
         - Amlogic Meson-AXG A113D based development board
         - Amlogic S912 based Khadas VIM2 single-board computer
         - Amlogic S912 based Tronsmart Vega S96 set-top-box
      
         - Allwinner H5 based NanoPi NEO Plus2 single-board computer
         - Allwinner R40 based Banana Pi M2 Ultra and Berry single-board computers
         - Allwinner A83T based TBS A711 Tablet
      
         - Broadcom Hurricane 2 based Ubiquiti UniFi Switch 8
         - Broadcom bcm47xx based Luxul XAP-1440/XAP-810/ABR-4500/XBR-4500
           wireless access points and routers
      
         - NXP i.MX51 based Zodiac Inflight Innovations RDU1 board
         - NXP i.MX53 based GE Healthcare PPD biometric monitor
         - NXP i.MX6 based Pistachio single-board computer
         - NXP i.MX6 based Vining-2000 automotive diagnostic interface
         - NXP i.MX6 based Ka-Ro TX6 Computer-on-Module in additional variants
      
         - Qualcomm MSM8974 (Snapdragon 800) based Fairphone 2 phone
         - Qualcomm MSM8974pro (Snapdragon 801) based Sony Xperia Z2 Tablet
      
         - Realtek RTD1295 based set-top-boxes MeLE V9 and PROBOX2 AVA
      
         - Renesas R-Car V3M (R8A77970) SoC and "Eagle" reference board
         - Renesas H3ULCB and M3ULCB "Kingfisher" extension infotainment boards
         - Renasas r8a7745 based iWave G22D-SODIMM SoM
      
         - Rockchip rk3288 based Amarula Vyasa single-board computer
      
         - Samsung Exynos5800 based Odroid HC1 single-board computer
      
        For existing SoC support, there was a lot of ongoing work, as usual
        most of that concentrated on the Renesas, Rockchip, OMAP, i.MX,
        Amlogic and Allwinner platforms, but others were also active.
      
        Rob Herring and many others worked on reducing the number of issues
        that the latest version of 'dtc' now warns about. Unfortunately there
        is still a lot left to do.
      
        A rework of the ARM foundation model introduced several new files for
        common variations of the model"
      
      * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (599 commits)
        arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3
        dt-bindings: bus: Add documentation for the Technologic Systems NBUS
        arm64: dts: actions: s900-bubblegum-96: Add fake uart5 clock
        ARM: dts: owl-s500: Add CubieBoard6
        dt-bindings: arm: actions: Add CubieBoard6
        ARM: dts: owl-s500-guitar-bb-rev-b: Add fake uart3 clock
        ARM: dts: owl-s500: Set power domains for CPU2 and CPU3
        arm: dts: mt7623: remove unused compatible string for pio node
        arm: dts: mt7623: update usb related nodes
        arm: dts: mt7623: update crypto node
        ARM: dts: sun8i: a711: Enable USB OTG
        ARM: dts: sun8i: a711: Add regulator support
        ARM: dts: sun8i: a83t: bananapi-m3: Enable AP6212 WiFi on mmc1
        ARM: dts: sun8i: a83t: cubietruck-plus: Enable AP6330 WiFi on mmc1
        ARM: dts: sun8i: a83t: Move mmc1 pinctrl setting to dtsi file
        ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator nodes
        ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes
        ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes
        ARM: dts: sunxi: Add dtsi for AXP81x PMIC
        arm64: dts: allwinner: H5: Restore EMAC changes
        ...
      527d1470
    • L
      Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 8c609698
      Linus Torvalds 提交于
      Pull ARM SoC platform updates from Arnd Bergmann:
       "Most of the commits are for defconfig changes, to enable newly added
        drivers or features that people have started using. For the changed
        lines lines, we have mostly cleanups, the affected platforms are OMAP,
        Versatile, EP93xx, Samsung, Broadcom, i.MX, and Actions.
      
        The largest single change is the introduction of the TI "sysc" bus
        driver, with the intention of cleaning up more legacy code.
      
        Two new SoC platforms get added this time:
      
         - Allwinner R40 is a modernized version of the A20 chip, now with a
           Quad-Core ARM Cortex-A7. According to the manufacturer, it is
           intended for "Smart Hardware"
      
         - Broadcom Hurricane 2 (Aka Strataconnect BCM5334X) is a family of
           chips meant for managed gigabit ethernet switches, based around a
           Cortex-A9 CPU.
      
        Finally, we gain SMP support for two platforms: Renesas R-Car E2 and
        Amlogic Meson8/8b, which were previously added but only supported
        uniprocessor operation"
      
      * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (118 commits)
        ARM: multi_v7_defconfig: Select RPMSG_VIRTIO as module
        ARM: multi_v7_defconfig: enable CONFIG_GPIO_UNIPHIER
        arm64: defconfig: enable CONFIG_GPIO_UNIPHIER
        ARM: meson: enable MESON_IRQ_GPIO in Kconfig for meson8b
        ARM: meson: Add SMP bringup code for Meson8 and Meson8b
        ARM: smp_scu: allow the platform code to read the SCU CPU status
        ARM: smp_scu: add a helper for powering on a specific CPU
        dt-bindings: Amlogic: Add Meson8 and Meson8b SMP related documentation
        ARM: OMAP3: Delete an unnecessary variable initialisation in omap3xxx_hwmod_init()
        ARM: OMAP3: Use common error handling code in omap3xxx_hwmod_init()
        ARM: defconfig: select the right SX150X driver
        arm64: defconfig: Enable QCOM_IOMMU
        arm64: Add ThunderX drivers to defconfig
        arm64: defconfig: Enable Tegra PCI controller
        cpufreq: imx6q: Move speed grading check to cpufreq driver
        arm64: defconfig: re-enable Qualcomm DB410c USB
        ARM: configs: stm32: Add MDMA support in STM32 defconfig
        ARM: imx: Enable cpuidle for i.MX6DL starting at 1.1
        bus: ti-sysc: Fix unbalanced pm_runtime_enable by adding remove
        bus: ti-sysc: mark PM functions as __maybe_unused
        ...
      8c609698
    • L
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 18c83d2c
      Linus Torvalds 提交于
      Pull virtio updates from Michael Tsirkin:
       "Fixes in qemu, vhost and virtio"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        fw_cfg: fix the command line module name
        vhost/vsock: fix uninitialized vhost_vsock->guest_cid
        vhost: fix end of range for access_ok
        vhost/scsi: Use safe iteration in vhost_scsi_complete_cmd_work()
        virtio_balloon: fix deadlock on OOM
      18c83d2c
    • L
      Merge tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 051089a2
      Linus Torvalds 提交于
      Pull xen updates from Juergen Gross:
       "Xen features and fixes for v4.15-rc1
      
        Apart from several small fixes it contains the following features:
      
         - a series by Joao Martins to add vdso support of the pv clock
           interface
      
         - a series by Juergen Gross to add support for Xen pv guests to be
           able to run on 5 level paging hosts
      
         - a series by Stefano Stabellini adding the Xen pvcalls frontend
           driver using a paravirtualized socket interface"
      
      * tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
        xen/pvcalls: fix potential endless loop in pvcalls-front.c
        xen/pvcalls: Add MODULE_LICENSE()
        MAINTAINERS: xen, kvm: track pvclock-abi.h changes
        x86/xen/time: setup vcpu 0 time info page
        x86/xen/time: set pvclock flags on xen_time_init()
        x86/pvclock: add setter for pvclock_pvti_cpu0_va
        ptp_kvm: probe for kvm guest availability
        xen/privcmd: remove unused variable pageidx
        xen: select grant interface version
        xen: update arch/x86/include/asm/xen/cpuid.h
        xen: add grant interface version dependent constants to gnttab_ops
        xen: limit grant v2 interface to the v1 functionality
        xen: re-introduce support for grant v2 interface
        xen: support priv-mapping in an HVM tools domain
        xen/pvcalls: remove redundant check for irq >= 0
        xen/pvcalls: fix unsigned less than zero error check
        xen/time: Return -ENODEV from xen_get_wallclock()
        xen/pvcalls-front: mark expected switch fall-through
        xen: xenbus_probe_frontend: mark expected switch fall-throughs
        xen/time: do not decrease steal time after live migration on xen
        ...
      051089a2
    • L
      Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 974aa563
      Linus Torvalds 提交于
      Pull KVM updates from Radim Krčmář:
       "First batch of KVM changes for 4.15
      
        Common:
         - Python 3 support in kvm_stat
         - Accounting of slabs to kmemcg
      
        ARM:
         - Optimized arch timer handling for KVM/ARM
         - Improvements to the VGIC ITS code and introduction of an ITS reset
           ioctl
         - Unification of the 32-bit fault injection logic
         - More exact external abort matching logic
      
        PPC:
         - Support for running hashed page table (HPT) MMU mode on a host that
           is using the radix MMU mode; single threaded mode on POWER 9 is
           added as a pre-requisite
         - Resolution of merge conflicts with the last second 4.14 HPT fixes
         - Fixes and cleanups
      
        s390:
         - Some initial preparation patches for exitless interrupts and crypto
         - New capability for AIS migration
         - Fixes
      
        x86:
         - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
           and after-reset state
         - Refined dependencies for VMX features
         - Fixes for nested SMI injection
         - A lot of cleanups"
      
      * tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
        KVM: s390: provide a capability for AIS state migration
        KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
        KVM: s390: abstract conversion between isc and enum irq_types
        KVM: s390: vsie: use common code functions for pinning
        KVM: s390: SIE considerations for AP Queue virtualization
        KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
        KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
        KVM: arm/arm64: fix the incompatible matching for external abort
        KVM: arm/arm64: Unify 32bit fault injection
        KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
        KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
        KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
        KVM: arm/arm64: vgic-its: New helper functions to free the caches
        KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
        arm/arm64: KVM: Load the timer state when enabling the timer
        KVM: arm/arm64: Rework kvm_timer_should_fire
        KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
        KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
        KVM: arm/arm64: Move phys_timer_emulate function
        KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
        ...
      974aa563
    • L
      Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 441692aa
      Linus Torvalds 提交于
      Pull ARM updates from Russell King:
      
       - add support for ELF fdpic binaries on both MMU and noMMU platforms
      
       - linker script cleanups
      
       - support for compressed .data section for XIP images
      
       - discard memblock arrays when possible
      
       - various cleanups
      
       - atomic DMA pool updates
      
       - better diagnostics of missing/corrupt device tree
      
       - export information to allow userspace kexec tool to place images more
         inteligently, so that the device tree isn't overwritten by the
         booting kernel
      
       - make early_printk more efficient on semihosted systems
      
       - noMMU cleanups
      
       - SA1111 PCMCIA update in preparation for further cleanups
      
      * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (38 commits)
        ARM: 8719/1: NOMMU: work around maybe-uninitialized warning
        ARM: 8717/2: debug printch/printascii: translate '\n' to "\r\n" not "\n\r"
        ARM: 8713/1: NOMMU: Support MPU in XIP configuration
        ARM: 8712/1: NOMMU: Use more MPU regions to cover memory
        ARM: 8711/1: V7M: Add support for MPU to M-class
        ARM: 8710/1: Kconfig: Kill CONFIG_VECTORS_BASE
        ARM: 8709/1: NOMMU: Disallow MPU for XIP
        ARM: 8708/1: NOMMU: Rework MPU to be mostly done in C
        ARM: 8707/1: NOMMU: Update MPU accessors to use cp15 helpers
        ARM: 8706/1: NOMMU: Move out MPU setup in separate module
        ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()
        ARM: 8705/1: early_printk: use printascii() rather than printch()
        ARM: 8703/1: debug.S: move hexbuf to a writable section
        ARM: add additional table to compressed kernel
        ARM: decompressor: fix BSS size calculation
        pcmcia: sa1111: remove special sa1111 mmio accessors
        pcmcia: sa1111: use sa1111_get_irq() to obtain IRQ resources
        ARM: better diagnostics with missing/corrupt dtb
        ARM: 8699/1: dma-mapping: Remove init_dma_coherent_pool_size()
        ARM: 8698/1: dma-mapping: Mark atomic_pool as __ro_after_init
        ..
      441692aa
    • L
      Merge tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5b0e2cb0
      Linus Torvalds 提交于
      Pull powerpc updates from Michael Ellerman:
       "A bit of a small release, I suspect in part due to me travelling for
        KS. But my backlog of patches to review is smaller than usual, so I
        think in part folks just didn't send as much this cycle.
      
        Non-highlights:
      
         - Five fixes for the >128T address space handling, both to fix bugs
           in our implementation and to bring the semantics exactly into line
           with x86.
      
        Highlights:
      
         - Support for a new OPAL call on bare metal machines which gives us a
           true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
      
         - Support for Power9 DD2 in the CXL driver.
      
         - Improvements to machine check handling so that uncorrectable errors
           can be reported into the generic memory_failure() machinery.
      
         - Some fixes and improvements for VPHN, which is used under PowerVM
           to notify the Linux partition of topology changes.
      
         - Plumbing to enable TM (transactional memory) without suspend on
           some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
      
         - Support for emulating vector loads form cache-inhibited memory, on
           some Power9 revisions.
      
         - Disable the fast-endian switch "syscall" by default (behind a
           CONFIG), we believe it has never had any users.
      
         - A major rework of the API drivers use when initiating and waiting
           for long running operations performed by OPAL firmware, and changes
           to the powernv_flash driver to use the new API.
      
         - Several fixes for the handling of FP/VMX/VSX while processes are
           using transactional memory.
      
         - Optimisations of TLB range flushes when using the radix MMU on
           Power9.
      
         - Improvements to the VAS facility used to access coprocessors on
           Power9, and related improvements to the way the NX crypto driver
           handles requests.
      
         - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
      
        Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
        Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
        Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
        Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
        Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
        Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
        Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
        Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
        Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
        Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
        Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
        Kennington III"
      
      * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
        powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
        powerpc/64s: Fix masking of SRR1 bits on instruction fault
        powerpc/64s: mm_context.addr_limit is only used on hash
        powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
        powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
        powerpc/64s/hash: Fix fork() with 512TB process address space
        powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
        powerpc/64s/hash: Fix 512T hint detection to use >= 128T
        powerpc: Fix DABR match on hash based systems
        powerpc/signal: Properly handle return value from uprobe_deny_signal()
        powerpc/fadump: use kstrtoint to handle sysfs store
        powerpc/lib: Implement UACCESS_FLUSHCACHE API
        powerpc/lib: Implement PMEM API
        powerpc/powernv/npu: Don't explicitly flush nmmu tlb
        powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
        powerpc/powernv/idle: Round up latency and residency values
        powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
        powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
        powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
        powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
        ...
      5b0e2cb0
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 758f8758
      Linus Torvalds 提交于
      Pull user namespace update from Eric Biederman:
       "The only change that is production ready this round is the work to
        increase the number of uid and gid mappings a user namespace can
        support from 5 to 340.
      
        This code was carefully benchmarked and it was confirmed that in the
        existing cases the performance remains the same. In the worst case
        with 340 mappings an cache cold stat times go from 158ns to 248ns.
        That is noticable but still quite small, and only the people who are
        doing crazy things pay the cost.
      
        This work uncovered some documentation and cleanup opportunities in
        the mapping code, and patches to make those cleanups and improve the
        documentation will be coming in the next merge window"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        userns: Simplify insert_extent
        userns: Make map_id_down a wrapper for map_id_range_down
        userns: Don't read extents twice in m_start
        userns: Simplify the user and group mapping functions
        userns: Don't special case a count of 0
        userns: bump idmap limits to 340
        userns: use union in {g,u}idmap struct
      758f8758
    • L
      Merge tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · a02cd422
      Linus Torvalds 提交于
      Pull f2fs updates from Jaegeuk Kim:
       "In this round, we introduce sysfile-based quota support which is
        required for Android by default. In addition, we allow that users are
        able to reserve some blocks in runtime to mitigate performance drops
        in low free space.
      
        Enhancements:
         - assign proper data segments according to write_hints given by user
         - issue cache_flush on dirty devices only among multiple devices
         - exploit cp_error flag and add more faults to enhance fault
           injection test
         - conduct more readaheads during f2fs_readdir
         - add a range for discard commands
      
        Bug fixes:
         - fix zero stat->st_blocks when inline_data is set
         - drop crypto key and free stale memory pointer while evict_inode is
           failing
         - fix some corner cases in free space and segment management
         - fix wrong last_disk_size
      
        This series includes lots of clean-ups and code enhancement in terms
        of xattr operations, discard/flush command control. In addition, it
        adds versatile debugfs entries to monitor f2fs status"
      
      * tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits)
        f2fs: deny accessing encryption policy if encryption is off
        f2fs: inject fault in inc_valid_node_count
        f2fs: fix to clear FI_NO_PREALLOC
        f2fs: expose quota information in debugfs
        f2fs: separate nat entry mem alloc from nat_tree_lock
        f2fs: validate before set/clear free nat bitmap
        f2fs: avoid opened loop codes in __add_ino_entry
        f2fs: apply write hints to select the type of segments for buffered write
        f2fs: introduce scan_curseg_cache for cleanup
        f2fs: optimize the way of traversing free_nid_bitmap
        f2fs: keep scanning until enough free nids are acquired
        f2fs: trace checkpoint reason in fsync()
        f2fs: keep isize once block is reserved cross EOF
        f2fs: avoid race in between GC and block exchange
        f2fs: save a multiplication for last_nid calculation
        f2fs: fix summary info corruption
        f2fs: remove dead code in update_meta_page
        f2fs: remove unneeded semicolon
        f2fs: don't bother with inode->i_version
        f2fs: check curseg space before foreground GC
        ...
      a02cd422
    • L
      Merge tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 487e2c9f
      Linus Torvalds 提交于
      Pull AFS updates from David Howells:
       "kAFS filesystem driver overhaul.
      
        The major points of the overhaul are:
      
         (1) Preliminary groundwork is laid for supporting network-namespacing
             of kAFS. The remainder of the namespacing work requires some way
             to pass namespace information to submounts triggered by an
             automount. This requires something like the mount overhaul that's
             in progress.
      
         (2) sockaddr_rxrpc is used in preference to in_addr for holding
             addresses internally and add support for talking to the YFS VL
             server. With this, kAFS can do everything over IPv6 as well as
             IPv4 if it's talking to servers that support it.
      
         (3) Callback handling is overhauled to be generally passive rather
             than active. 'Callbacks' are promises by the server to tell us
             about data and metadata changes. Callbacks are now checked when
             we next touch an inode rather than actively going and looking for
             it where possible.
      
         (4) File access permit caching is overhauled to store the caching
             information per-inode rather than per-directory, shared over
             subordinate files. Whilst older AFS servers only allow ACLs on
             directories (shared to the files in that directory), newer AFS
             servers break that restriction.
      
             To improve memory usage and to make it easier to do mass-key
             removal, permit combinations are cached and shared.
      
         (5) Cell database management is overhauled to allow lighter locks to
             be used and to make cell records autonomous state machines that
             look after getting their own DNS records and cleaning themselves
             up, in particular preventing races in acquiring and relinquishing
             the fscache token for the cell.
      
         (6) Volume caching is overhauled. The afs_vlocation record is got rid
             of to simplify things and the superblock is now keyed on the cell
             and the numeric volume ID only. The volume record is tied to a
             superblock and normal superblock management is used to mediate
             the lifetime of the volume fscache token.
      
         (7) File server record caching is overhauled to make server records
             independent of cells and volumes. A server can be in multiple
             cells (in such a case, the administrator must make sure that the
             VL services for all cells correctly reflect the volumes shared
             between those cells).
      
             Server records are now indexed using the UUID of the server
             rather than the address since a server can have multiple
             addresses.
      
         (8) File server rotation is overhauled to handle VMOVED, VBUSY (and
             similar), VOFFLINE and VNOVOL indications and to handle rotation
             both of servers and addresses of those servers. The rotation will
             also wait and retry if the server says it is busy.
      
         (9) Data writeback is overhauled. Each inode no longer stores a list
             of modified sections tagged with the key that authorised it in
             favour of noting the modified region of a page in page->private
             and storing a list of keys that made modifications in the inode.
      
             This simplifies things and allows other keys to be used to
             actually write to the server if a key that made a modification
             becomes useless.
      
        (10) Writable mmap() is implemented. This allows a kernel to be build
             entirely on AFS.
      
        Note that Pre AFS-3.4 servers are no longer supported, though this can
        be added back if necessary (AFS-3.4 was released in 1998)"
      
      * tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (35 commits)
        afs: Protect call->state changes against signals
        afs: Trace page dirty/clean
        afs: Implement shared-writeable mmap
        afs: Get rid of the afs_writeback record
        afs: Introduce a file-private data record
        afs: Use a dynamic port if 7001 is in use
        afs: Fix directory read/modify race
        afs: Trace the sending of pages
        afs: Trace the initiation and completion of client calls
        afs: Fix documentation on # vs % prefix in mount source specification
        afs: Fix total-length calculation for multiple-page send
        afs: Only progress call state at end of Tx phase from rxrpc callback
        afs: Make use of the YFS service upgrade to fully support IPv6
        afs: Overhaul volume and server record caching and fileserver rotation
        afs: Move server rotation code into its own file
        afs: Add an address list concept
        afs: Overhaul cell database management
        afs: Overhaul permit caching
        afs: Overhaul the callback handling
        afs: Rename struct afs_call server member to cm_server
        ...
      487e2c9f