1. 25 9月, 2015 1 次提交
    • P
      KVM: svm: do not call kvm_set_cr0 from init_vmcb · 79a8059d
      Paolo Bonzini 提交于
      kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the
      memslots array (SRCU protected).  Using a mini SRCU critical section
      is ugly, and adding it to kvm_arch_vcpu_create doesn't work because
      the VMX vcpu_create callback calls synchronize_srcu.
      
      Fixes this lockdep splat:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.3.0-rc1+ #1 Not tainted
      -------------------------------
      include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-i38/17000:
       #0:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm]
      
      [...]
      Call Trace:
       dump_stack+0x4e/0x84
       lockdep_rcu_suspicious+0xfd/0x130
       kvm_zap_gfn_range+0x188/0x1a0 [kvm]
       kvm_set_cr0+0xde/0x1e0 [kvm]
       init_vmcb+0x760/0xad0 [kvm_amd]
       svm_create_vcpu+0x197/0x250 [kvm_amd]
       kvm_arch_vcpu_create+0x47/0x70 [kvm]
       kvm_vm_ioctl+0x302/0x7e0 [kvm]
       ? __lock_is_held+0x51/0x70
       ? __fget+0x101/0x210
       do_vfs_ioctl+0x2f4/0x560
       ? __fget_light+0x29/0x90
       SyS_ioctl+0x4c/0x90
       entry_SYSCALL_64_fastpath+0x16/0x73
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      79a8059d
  2. 23 9月, 2015 1 次提交
  3. 21 9月, 2015 4 次提交
    • P
      KVM: x86: trap AMD MSRs for the TSeg base and mask · 3afb1121
      Paolo Bonzini 提交于
      These have roughly the same purpose as the SMRR, which we do not need
      to implement in KVM.  However, Linux accesses MSR_K8_TSEG_ADDR at
      boot, which causes problems when running a Xen dom0 under KVM.
      Just return 0, meaning that processor protection of SMRAM is not
      in effect.
      Reported-by: NM A Young <m.a.young@durham.ac.uk>
      Cc: stable@vger.kernel.org
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3afb1121
    • T
      KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() · 3eb4ee68
      Thomas Huth 提交于
      Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
      functions) has to be protected via the kvm->srcu lock.
      The kvmppc_h_logical_ci_load() and -store() functions are missing
      this lock so far, so let's add it there, too.
      This fixes the problem that the kernel reports "suspicious RCU usage"
      when lock debugging is enabled.
      
      Cc: stable@vger.kernel.org # v4.1+
      Fixes: 99342cf8Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3eb4ee68
    • G
      KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit · 7e022e71
      Gautham R. Shenoy 提交于
      In guest_exit_cont we call kvmhv_commence_exit which expects the trap
      number as the argument. However r3 doesn't contain the trap number at
      this point and as a result we would be calling the function with a
      spurious trap number.
      
      Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
      r12 contains the trap number.
      
      Cc: stable@vger.kernel.org # v4.1+
      Fixes: eddb60fbSigned-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7e022e71
    • P
      KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs · 5fc3e64f
      Paul Mackerras 提交于
      This fixes a bug which results in stale vcore pointers being left in
      the per-cpu preempted vcore lists when a VM is destroyed.  The result
      of the stale vcore pointers is usually either a crash or a lockup
      inside collect_piggybacks() when another VM is run.  A typical
      lockup message looks like:
      
      [  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039]
      [  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv]
      [  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49
      [  472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000
      [  472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130
      [  472.162337] REGS: c000001e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
      [  472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  XER: 00000000
      [  472.162588] CFAR: c00000000096b0ac SOFTE: 1
      GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001
      GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821
      GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740
      GPR12: c000000000111130 c00000000fdae400
      [  472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130
      [  472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130
      [  472.163179] Call Trace:
      [  472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable)
      [  472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90
      [  472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv]
      [  472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
      [  472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [  472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
      [  472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
      [  472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0
      [  472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0
      [  472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0
      [  472.164060] Instruction dump:
      [  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2
      [  472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070
      
      The bug is that kvmppc_run_vcpu does not correctly handle the case
      where a vcpu task receives a signal while its guest vcpu is executing
      in the guest as a result of being piggy-backed onto the execution of
      another vcore.  In that case we need to wait for the vcpu to finish
      executing inside the guest, and then remove this vcore from the
      preempted vcores list.  That way, we avoid leaving this vcpu's vcore
      on the preempted vcores list when the vcpu gets interrupted.
      
      Fixes: ec257165Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5fc3e64f
  4. 18 9月, 2015 2 次提交
    • I
      kvm: svm: reset mmu on VCPU reset · ebae871a
      Igor Mammedov 提交于
      When INIT/SIPI sequence is sent to VCPU which before that
      was in use by OS, VMRUN might fail with:
      
       KVM: entry failed, hardware error 0xffffffff
       EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3
       ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
       EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
       ES =0000 00000000 0000ffff 00009300
       CS =9a00 0009a000 0000ffff 00009a00
       [...]
       CR0=60000010 CR2=b6f3e000 CR3=01942000 CR4=000007e0
       [...]
       EFER=0000000000000000
      
      with corresponding SVM error:
       KVM: FAILED VMRUN WITH VMCB:
       [...]
       cpl:            0                efer:         0000000000001000
       cr0:            0000000080010010 cr2:          00007fd7fe85bf90
       cr3:            0000000187d0c000 cr4:          0000000000000020
       [...]
      
      What happens is that VCPU state right after offlinig:
      CR0: 0x80050033  EFER: 0xd01  CR4: 0x7e0
        -> long mode with CR3 pointing to longmode page tables
      
      and when VCPU gets INIT/SIPI following transition happens
      CR0: 0 -> 0x60000010 EFER: 0x0  CR4: 0x7e0
        -> paging disabled with stale CR3
      
      However SVM under the hood puts VCPU in Paged Real Mode*
      which effectively translates CR0 0x60000010 -> 80010010 after
      
         svm_vcpu_reset()
             -> init_vmcb()
                 -> kvm_set_cr0()
                     -> svm_set_cr0()
      
      but from  kvm_set_cr0() perspective CR0: 0 -> 0x60000010
      only caching bits are changed and
      commit d81135a5
       ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")'
      regressed svm_vcpu_reset() which relied on MMU being reset.
      
      As result VMRUN after svm_vcpu_reset() tries to run
      VCPU in Paged Real Mode with stale MMU context (longmode page tables),
      which causes some AMD CPUs** to bail out with VMEXIT_INVALID.
      
      Fix issue by unconditionally resetting MMU context
      at init_vmcb() time.
      
      	* AMD64 Architecture Programmer’s Manual,
      	    Volume 2: System Programming, rev: 3.25
      	      15.19 Paged Real Mode
      	** Opteron 1216
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Fixes: d81135a5
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ebae871a
    • D
      sched: access local runqueue directly in single_task_running · 00cc1633
      Dominik Dingel 提交于
      Commit 2ee507c4 ("sched: Add function single_task_running to let a task
      check if it is the only task running on a cpu") referenced the current
      runqueue with the smp_processor_id.  When CONFIG_DEBUG_PREEMPT is enabled,
      that is only allowed if preemption is disabled or the currrent task is
      bound to the local cpu (e.g. kernel worker).
      
      With commit f7819512 ("kvm: add halt_poll_ns module parameter") KVM
      calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
      generates a lot of kernel messages.
      
      To avoid adding preemption in that cases, as it would limit the usefulness,
      we change single_task_running to access directly the cpu local runqueue.
      
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Fixes: 2ee507c4Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      00cc1633
  5. 17 9月, 2015 6 次提交
    • P
      Merge tag 'kvm-arm-for-4.3-rc2-2' of... · efe4d36a
      Paolo Bonzini 提交于
      Merge tag 'kvm-arm-for-4.3-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      Second set of KVM/ARM changes for 4.3-rc2
      
      - Workaround for a Cortex-A57 erratum
      - Bug fix for the debugging infrastructure
      - Fix for 32bit guests with more than 4GB of address space
        on a 32bit host
      - A number of fixes for the (unusual) case when we don't use
        the in-kernel GIC emulation
      - Removal of ThumbEE handling on arm64, since these have been
        dropped from the architecture before anyone actually ever
        built a CPU
      - Remove the KVM_ARM_MAX_VCPUS limitation which has become
        fairly pointless
      efe4d36a
    • M
      arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' · ef748917
      Ming Lei 提交于
      This patch removes config option of KVM_ARM_MAX_VCPUS,
      and like other ARCHs, just choose the maximum allowed
      value from hardware, and follows the reasons:
      
      1) from distribution view, the option has to be
      defined as the max allowed value because it need to
      meet all kinds of virtulization applications and
      need to support most of SoCs;
      
      2) using a bigger value doesn't introduce extra memory
      consumption, and the help text in Kconfig isn't accurate
      because kvm_vpu structure isn't allocated until request
      of creating VCPU is sent from QEMU;
      
      3) the main effect is that the field of vcpus[] in 'struct kvm'
      becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
      lines to hold the structure, but 'struct kvm' is one generic struct,
      and it has worked well on other ARCHs already in this way. Also,
      the world switch frequecy is often low, for example, it is ~2000
      when running kernel building load in VM from APM xgene KVM host,
      so the effect is very small, and the difference can't be observed
      in my test at all.
      
      Cc: Dann Frazier <dann.frazier@canonical.com>
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ef748917
    • W
      arm64: KVM: Remove all traces of the ThumbEE registers · 34c3faa3
      Will Deacon 提交于
      Although the ThumbEE registers and traps were present in earlier
      versions of the v8 architecture, it was retrospectively removed and so
      we can do the same.
      
      Whilst this breaks migrating a guest started on a previous version of
      the kernel, it is much better to kill these (non existent) registers
      as soon as possible.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      [maz: added commend about migration]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      34c3faa3
    • M
      arm: KVM: Disable virtual timer even if the guest is not using it · 688bc577
      Marc Zyngier 提交于
      When running a guest with the architected timer disabled (with QEMU and
      the kernel_irqchip=off option, for example), it is important to make
      sure the timer gets turned off. Otherwise, the guest may try to
      enable it anyway, leading to a screaming HW interrupt.
      
      The fix is to unconditionally turn off the virtual timer on guest
      exit.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      688bc577
    • M
      arm64: KVM: Disable virtual timer even if the guest is not using it · c4cbba9f
      Marc Zyngier 提交于
      When running a guest with the architected timer disabled (with QEMU and
      the kernel_irqchip=off option, for example), it is important to make
      sure the timer gets turned off. Otherwise, the guest may try to
      enable it anyway, leading to a screaming HW interrupt.
      
      The fix is to unconditionally turn off the virtual timer on guest
      exit.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      c4cbba9f
    • P
      arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources · c2f58514
      Pavel Fedin 提交于
      Until b26e5fda ("arm/arm64: KVM: introduce per-VM ops"),
      kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(),
      and vgic_v2_map_resources() still has it.
      
      But now vm_ops are not initialized until we call kvm_vgic_create().
      Therefore kvm_vgic_map_resources() can being called without a VGIC,
      and we die because vm_ops.map_resources is NULL.
      
      Fixing this restores QEMU's kernel-irqchip=off option to a working state,
      allowing to use GIC emulation in userspace.
      
      Fixes: b26e5fda ("arm/arm64: KVM: introduce per-VM ops")
      Cc: stable@vger.kernel.org
      Signed-off-by: NPavel Fedin <p.fedin@samsung.com>
      [maz: reworked commit message]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      c2f58514
  6. 16 9月, 2015 5 次提交
  7. 15 9月, 2015 5 次提交
    • J
      kvm: fix zero length mmio searching · 8f4216c7
      Jason Wang 提交于
      Currently, if we had a zero length mmio eventfd assigned on
      KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it
      always compares the kvm_io_range() with the length that guest
      wrote. This will cause e.g for vhost, kick will be trapped by qemu
      userspace instead of vhost. Fixing this by using zero length if an
      iodevice is zero length.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f4216c7
    • J
      kvm: fix double free for fast mmio eventfd · eefd6b06
      Jason Wang 提交于
      We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS
      and once on KVM_FAST_MMIO_BUS but with a single iodev
      instance. This will lead to an issue: kvm_io_bus_destroy() knows
      nothing about the devices on two buses pointing to a single dev. Which
      will lead to double free[1] during exit. Fix this by allocating two
      instances of iodevs then registering one on KVM_MMIO_BUS and another
      on KVM_FAST_MMIO_BUS.
      
      CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
      Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
      task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
      RIP: 0010:[<ffffffffc07e25d8>]  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
      RSP: 0018:ffff88020e7f3bc8  EFLAGS: 00010292
      RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
      RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
      RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
      R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
      R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
      FS:  00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
      Stack:
      ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
      ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
       0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
      Call Trace:
      [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
      [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
      [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
      [<ffffffff811f69f7>] __fput+0xe7/0x250
      [<ffffffff811f6bae>] ____fput+0xe/0x10
      [<ffffffff81093f04>] task_work_run+0xd4/0xf0
      [<ffffffff81079358>] do_exit+0x368/0xa50
      [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
      [<ffffffff81079ad5>] do_group_exit+0x45/0xb0
      [<ffffffff81085c71>] get_signal+0x291/0x750
      [<ffffffff810144d8>] do_signal+0x28/0xab0
      [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
      [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
      [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
      [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
      [<ffffffff817cb9af>] int_signal+0x12/0x17
      Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
       RIP  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
       RSP <ffff88020e7f3bc8>
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eefd6b06
    • J
      kvm: factor out core eventfd assign/deassign logic · 85da11ca
      Jason Wang 提交于
      This patch factors out core eventfd assign/deassign logic and leaves
      the argument checking and bus index selection to callers.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85da11ca
    • J
      kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd · 8453fecb
      Jason Wang 提交于
      We only want zero length mmio eventfd to be registered on
      KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to
      make sure this.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8453fecb
    • W
      KVM: make the declaration of functions within 80 characters · 81523aac
      Wei Yang 提交于
      After 'commit 0b8ba4a2 ("KVM: fix checkpatch.pl errors in
      kvm/coalesced_mmio.h")', the declaration of the two function will exceed 80
      characters.
      
      This patch reduces the TAPs to make each line in 80 characters.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      81523aac
  8. 14 9月, 2015 3 次提交
  9. 13 9月, 2015 5 次提交
    • L
      Linux 4.3-rc1 · 6ff33f39
      Linus Torvalds 提交于
      6ff33f39
    • L
      Merge tag 'cris-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris · 6917b51d
      Linus Torvalds 提交于
      Pull CRIS updates from Jesper Nilsson:
       "Mostly removal of old cruft of which we can use a generic version, or
        fixes for code not commonly run in the cris port, but also additions
        to enable some good debug"
      
      * tag 'cris-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris: (25 commits)
        CRISv10: delete unused lib/dmacopy.c
        CRISv10: delete unused lib/old_checksum.c
        CRIS: fix switch_mm() lockdep splat
        CRISv32: enable LOCKDEP_SUPPORT
        CRIS: add STACKTRACE_SUPPORT
        CRISv32: annotate irq enable in idle loop
        CRISv32: add support for irqflags tracing
        CRIS: UAPI: use generic types.h
        CRIS: UAPI: use generic shmbuf.h
        CRIS: UAPI: use generic msgbuf.h
        CRIS: UAPI: use generic socket.h
        CRIS: UAPI: use generic sembuf.h
        CRIS: UAPI: use generic sockios.h
        CRIS: UAPI: use generic auxvec.h
        CRIS: UAPI: use generic headers via Kbuild
        CRIS: UAPI: fix elf.h export
        CRIS: don't make asm/elf.h depend on asm/user.h
        CRIS: UAPI: fix ptrace.h
        CRISv32: Squash compile warnings for axisflashmap
        CRISv32: Add GPIO driver to the default configs
        ...
      6917b51d
    • L
      blk: rq_data_dir() should not return a boolean · 10fbd36e
      Linus Torvalds 提交于
      rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not
      a boolean value.
      
      Now, admittedly the "!= 0" doesn't really change the value (0 stays as
      zero, 1 stays as one), but it's not only redundant, it confuses gcc, and
      causes gcc to warn about the construct
      
          switch (rq_data_dir(req)) {
              case READ:
                  ...
              case WRITE:
                  ...
      
      that we have in a few drivers.
      
      Now, the gcc warning is silly and stupid (it seems to warn not about the
      switch value having a different type from the case statements, but about
      _any_ boolean switch value), but in this case the code itself is silly
      and stupid too, so let's just change it, and get rid of warnings like
      this:
      
        drivers/block/hd.c: In function ‘hd_request’:
        drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool]
           switch (rq_data_dir(req)) {
      
      The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in
      commit 5953316d ("block: make rq->cmd_flags be 64-bit") and is
      presumably because the old code (that just did a logical 'and' with 1)
      would then end up making the type of rq_data_dir() be u64 too.
      
      But if we want to retain the old regular integer type, let's just cast
      the result to 'int' rather than use that rather odd '!= 0'.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10fbd36e
    • L
      Merge branch 'writeback-plugging' · e1df8b0a
      Linus Torvalds 提交于
      Fix up the writeback plugging introduced in commit d353d758
      ("writeback: plug writeback at a high level") that then caused problems
      due to the unplug happening with a spinlock held.
      
      * writeback-plugging:
        writeback: plug writeback in wb_writeback() and writeback_inodes_wb()
        Revert "writeback: plug writeback at a high level"
      e1df8b0a
    • L
      writeback: plug writeback in wb_writeback() and writeback_inodes_wb() · 505a666e
      Linus Torvalds 提交于
      We had to revert the pluggin in writeback_sb_inodes() because the
      wb->list_lock is held, but we could easily plug at a higher level before
      taking that lock, and unplug after releasing it.  This does that.
      
      Chris will run performance numbers, just to verify that this approach is
      comparable to the alternative (we could just drop and re-take the lock
      around the blk_finish_plug() rather than these two commits.
      
      I'd have preferred waiting for actual performance numbers before picking
      one approach over the other, but I don't want to release rc1 with the
      known "sleeping function called from invalid context" issue, so I'll
      pick this cleanup version for now.  But if the numbers show that we
      really want to plug just at the writeback_sb_inodes() level, and we
      should just play ugly games with the spinlock, we'll switch to that.
      
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      505a666e
  10. 12 9月, 2015 8 次提交
    • L
      thermal: fix intel PCH thermal driver mismerge · dfb22fc5
      Linus Torvalds 提交于
      I didn't notice this when merging the thermal code from Zhang, but his
      merge (commit 5a924a07: "Merge branches 'thermal-core' and
      'thermal-intel' of .git into next") of the thermal-core and
      thermal-intel branches was wrong.
      
      In thermal-core, commit 17e8351a ("thermal: consistently use int for
      temperatures") converted the thermal layer to use "int" for
      temperatures.
      
      But in parallel, in the thermal-intel branch commit d0a12625
      ("thermal: Add Intel PCH thermal driver") added support for the intel
      PCH thermal sensor using the old interfaces that used "unsigned long"
      pointers.
      
      This resulted in warnings like this:
      
        drivers/thermal/intel_pch_thermal.c:184:14: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
          .get_temp = pch_thermal_get_temp,
                      ^
        drivers/thermal/intel_pch_thermal.c:184:14: note: (near initialization for ‘tzd_ops.get_temp’)
        drivers/thermal/intel_pch_thermal.c:186:19: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
          .get_trip_temp = pch_get_trip_temp,
                           ^
        drivers/thermal/intel_pch_thermal.c:186:19: note: (near initialization for ‘tzd_ops.get_trip_temp’)
      
      This fixes it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfb22fc5
    • L
      Merge branch 'akpm' (patches from Andrew) · 01b0c014
      Linus Torvalds 提交于
      Merge fourth patch-bomb from Andrew Morton:
      
       - sys_membarier syscall
      
       - seq_file interface changes
      
       - a few misc fixups
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"
        mm/early_ioremap: add explicit #include of asm/early_ioremap.h
        fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void
        selftests: enhance membarrier syscall test
        selftests: add membarrier syscall test
        sys_membarrier(): system-wide memory barrier (generic, x86)
        MODSIGN: fix a compilation warning in extract-cert
      01b0c014
    • V
      ARCv2: [axs103_smp] Reduce clk for SMP FPGA configs · 3ebb0540
      Vineet Gupta 提交于
      Newer bitfiles needs the reduced clk even for SMP builds
      
      Cc: <stable@vger.kernel.org>  #4.2
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ebb0540
    • L
      Merge tag 'ntb-4.3' of git://github.com/jonmason/ntb · ded0e250
      Linus Torvalds 提交于
      Pull NTB fixes from Jon Mason:
       "NTB bug and documentation fixes, new device IDs, performance
        improvements, and adding a mailing list to MAINTAINERS for NTB"
      
      * tag 'ntb-4.3' of git://github.com/jonmason/ntb:
        NTB: Fix range check on memory window index
        NTB: Improve index handling in B2B MW workaround
        NTB: Fix documentation for ntb_peer_db_clear.
        NTB: Fix documentation for ntb_link_is_up
        NTB: Use unique DMA channels for TX and RX
        NTB: Remove dma_sync_wait from ntb_async_rx
        NTB: Clean up QP stats info
        NTB: Make the transport list in order of discovery
        NTB: Add PCI Device IDs for Broadwell Xeon
        NTB: Add flow control to the ntb_netdev
        NTB: Add list to MAINTAINERS
      ded0e250
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · f0c032d8
      Linus Torvalds 提交于
      Pull more input updates from Dmitry Torokhov:
       "Second round of updates for the input subsystem.
      
        This introduces two brand new touchscreen drivers (Colibri and
        imx6ul_tsc), some small driver fixes, and we are no longer report
        errors from evdev_flush() as users do not really have a way of
        handling errors, error codes that we were returning were not on the
        list of errors supposed to be returned by close(), and errors were
        causing issues with one of older versions of systemd"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: imx_keypad - remove obsolete comment
        Input: touchscreen - add imx6ul_tsc driver support
        Input: Add touchscreen support for Colibri VF50
        Input: i8042 - lower log level for "no controller" message
        Input: evdev - do not report errors form flush()
        Input: elants_i2c - extend the calibration timeout to 12 seconds
        Input: sparcspkr - fix module autoload for OF platform drivers
        Input: regulator-haptic - fix module autoload for OF platform driver
        Input: pwm-beeper - fix module autoload for OF platform driver
        Input: ab8500-ponkey - Fix module autoload for OF platform driver
        Input: cyttsp - remove unnecessary MODULE_ALIAS()
        Input: elan_i2c - add ACPI ID "ELAN1000"
      f0c032d8
    • L
      Merge tag 'pm+acpi-4.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fa9a67ef
      Linus Torvalds 提交于
      Pull more power management and ACPI updates from Rafael Wysocki:
       "These are mostly fixes and cleanups on top of the previous PM+ACPI
        pull request (cpufreq core and drivers, cpuidle, generic power domains
        framework).  Some of them didn't make to that pull request and some
        fix issues introduced by it.
      
        The only really new thing is the support for suspend frequency in the
        cpufreq-dt driver, but it is needed to fix an issue with Exynos
        platforms.
      
        Specifics:
      
         - build fix for the new Mediatek MT8173 cpufreq driver (Guenter
           Roeck).
      
         - generic power domains framework fixes (power on error code path,
           subdomain removal) and cleanup of a deprecated API user (Geert
           Uytterhoeven, Jon Hunter, Ulf Hansson).
      
         - cpufreq-dt driver fixes including two fixes for bugs related to the
           new Operating Performance Points Device Tree bindings introduced
           recently (Viresh Kumar).
      
         - suspend frequency support for the cpufreq-dt driver (Bartlomiej
           Zolnierkiewicz, Viresh Kumar).
      
         - cpufreq core cleanups (Viresh Kumar).
      
         - intel_pstate driver fixes (Chen Yu, Kristen Carlson Accardi).
      
         - additional sanity check in the cpuidle core (Xunlei Pang).
      
         - fix for a comment related to CPU power management (Lina Iyer)"
      
      * tag 'pm+acpi-4.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        intel_pstate: fix PCT_TO_HWP macro
        intel_pstate: Fix user input of min/max to legal policy region
        PM / OPP: Return suspend_opp only if it is enabled
        cpufreq-dt: add suspend frequency support
        cpufreq: allow cpufreq_generic_suspend() to work without suspend frequency
        PM / OPP: add dev_pm_opp_get_suspend_opp() helper
        staging: board: Migrate away from __pm_genpd_name_add_device()
        cpufreq: Use __func__ to print function's name
        cpufreq: staticize cpufreq_cpu_get_raw()
        PM / Domains: Ensure subdomain is not in use before removing
        cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL
        cpuidle/coupled: Add sanity check for safe_state_index
        PM / Domains: Try power off masters in error path of __pm_genpd_poweron()
        cpufreq: dt: Tolerance applies on both sides of target voltage
        cpufreq: dt: Print error on failing to mark OPPs as shared
        cpufreq: dt: Check OPP count before marking them shared
        kernel/cpu_pm: fix cpu_cluster_pm_exit comment
      fa9a67ef
    • L
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 05c78081
      Linus Torvalds 提交于
      Pull SCSI target updates from Nicholas Bellinger:
       "Here are the outstanding target-pending updates for v4.3-rc1.
      
        Mostly bug-fixes and minor changes this round.  The fallout from the
        big v4.2-rc1 RCU conversion have (thus far) been minimal.
      
        The highlights this round include:
      
         - Move sense handling routines into scsi_common code (Sagi)
      
         - Return ABORTED_COMMAND sense key for PI errors (Sagi)
      
         - Add tpg_enabled_sendtargets attribute for disabled iscsi-target
           discovery (David)
      
         - Shrink target struct se_cmd by rearranging fields (Roland)
      
         - Drop iSCSI use of mutex around max_cmd_sn increment (Roland)
      
         - Replace iSCSI __kernel_sockaddr_storage with sockaddr_storage (Andy +
           Chris)
      
         - Honor fabric max_data_sg_nents I/O transfer limit (Arun + Himanshu +
           nab)
      
         - Fix EXTENDED_COPY >= v4.1 regression OOPsen (Alex + nab)"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (37 commits)
        target: use stringify.h instead of own definition
        target/user: Fix UFLAG_UNKNOWN_OP handling
        target: Remove no-op conditional
        target/user: Remove unused variable
        target: Fix max_cmd_sn increment w/o cmdsn mutex regressions
        target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess
        target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
        target/iscsi: Replace __kernel_sockaddr_storage with sockaddr_storage
        target/iscsi: Replace conn->login_ip with login_sockaddr
        target/iscsi: Keep local_ip as the actual sockaddr
        target/iscsi: Fix np_ip bracket issue by removing np_ip
        target: Drop iSCSI use of mutex around max_cmd_sn increment
        qla2xxx: Update tcm_qla2xxx module description to 24xx+
        iscsi-target: Add tpg_enabled_sendtargets for disabled discovery
        drivers: target: Drop unlikely before IS_ERR(_OR_NULL)
        target: check DPO/FUA usage for COMPARE AND WRITE
        target: Shrink struct se_cmd by rearranging fields
        target: Remove cmd->se_ordered_id (unused except debug log lines)
        target: add support for START_STOP_UNIT SCSI opcode
        target: improve unsupported opcode message
        ...
      05c78081
    • L
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 8e78b7dc
      Linus Torvalds 提交于
      Pull second round of SCSI updates from James Bottomley:
       "There's one late arriving patch here (added today), fixing a build
        issue which the scsi_dh patch set in here uncovered.  Other than that,
        everything has been incubated in -next and the checkers for a week.
      
        The major pieces of this patch are a set patches facilitating better
        integration between scsi and scsi_dh (the device handling layer used
        by multi-path; all the dm parts are acked by Mike Snitzer).
      
        This also includes driver updates for mp3sas, scsi_debug and an
        assortment of bug fixes"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (50 commits)
        scsi_dh: fix randconfig build error
        scsi: fix scsi_error_handler vs. scsi_host_dev_release race
        fcoe: Convert use of __constant_htons to htons
        mpt2sas: setpci reset kernel oops fix
        pm80xx: Don't override ts->stat on IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY
        lpfc: Fix possible use-after-free and double free in lpfc_mbx_cmpl_rdp_page_a2()
        bfa: Fix incorrect de-reference of pointer
        bfa: Fix indentation
        scsi_transport_sas: Remove check for SAS expander when querying bay/enclosure IDs.
        scsi_debug: resp_request: remove unused variable
        scsi_debug: fix REPORT LUNS Well Known LU
        scsi_debug: schedule_resp fix input variable check
        scsi_debug: make dump_sector static
        scsi_debug: vfree is null safe so drop the check
        scsi_debug: use SCSI_W_LUN_REPORT_LUNS instead of SAM2_WLUN_REPORT_LUNS;
        scsi_debug: define pr_fmt() for consistent logging
        mpt2sas: Refcount fw_events and fix unsafe list usage
        mpt2sas: Refcount sas_device objects and fix unsafe list usage
        scsi_dh: return SCSI_DH_NOTCONN in scsi_dh_activate()
        scsi_dh: don't allow to detach device handlers at runtime
        ...
      8e78b7dc