1. 16 10月, 2015 2 次提交
  2. 01 10月, 2015 5 次提交
  3. 15 9月, 2015 3 次提交
    • J
      kvm: fix double free for fast mmio eventfd · eefd6b06
      Jason Wang 提交于
      We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS
      and once on KVM_FAST_MMIO_BUS but with a single iodev
      instance. This will lead to an issue: kvm_io_bus_destroy() knows
      nothing about the devices on two buses pointing to a single dev. Which
      will lead to double free[1] during exit. Fix this by allocating two
      instances of iodevs then registering one on KVM_MMIO_BUS and another
      on KVM_FAST_MMIO_BUS.
      
      CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
      Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
      task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
      RIP: 0010:[<ffffffffc07e25d8>]  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
      RSP: 0018:ffff88020e7f3bc8  EFLAGS: 00010292
      RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
      RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
      RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
      R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
      R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
      FS:  00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
      Stack:
      ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
      ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
       0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
      Call Trace:
      [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
      [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
      [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
      [<ffffffff811f69f7>] __fput+0xe7/0x250
      [<ffffffff811f6bae>] ____fput+0xe/0x10
      [<ffffffff81093f04>] task_work_run+0xd4/0xf0
      [<ffffffff81079358>] do_exit+0x368/0xa50
      [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
      [<ffffffff81079ad5>] do_group_exit+0x45/0xb0
      [<ffffffff81085c71>] get_signal+0x291/0x750
      [<ffffffff810144d8>] do_signal+0x28/0xab0
      [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
      [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
      [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
      [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
      [<ffffffff817cb9af>] int_signal+0x12/0x17
      Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
       RIP  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
       RSP <ffff88020e7f3bc8>
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eefd6b06
    • J
      kvm: factor out core eventfd assign/deassign logic · 85da11ca
      Jason Wang 提交于
      This patch factors out core eventfd assign/deassign logic and leaves
      the argument checking and bus index selection to callers.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85da11ca
    • J
      kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd · 8453fecb
      Jason Wang 提交于
      We only want zero length mmio eventfd to be registered on
      KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to
      make sure this.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8453fecb
  4. 27 3月, 2015 2 次提交
  5. 12 3月, 2015 1 次提交
  6. 22 11月, 2014 1 次提交
  7. 24 9月, 2014 1 次提交
  8. 11 9月, 2014 1 次提交
  9. 06 8月, 2014 1 次提交
    • P
      KVM: Move more code under CONFIG_HAVE_KVM_IRQFD · c77dcacb
      Paolo Bonzini 提交于
      Commits e4d57e1e (KVM: Move irq notifier implementation into
      eventfd.c, 2014-06-30) included the irq notifier code unconditionally
      in eventfd.c, while it was under CONFIG_HAVE_KVM_IRQCHIP before.
      
      Similarly, commit 297e2105 (KVM: Give IRQFD its own separate enabling
      Kconfig option, 2014-06-30) moved code from CONFIG_HAVE_IRQ_ROUTING
      to CONFIG_HAVE_KVM_IRQFD but forgot to move the pieces that used to be
      under CONFIG_HAVE_KVM_IRQCHIP.
      
      Together, this broke compilation without CONFIG_KVM_XICS.  Fix by adding
      or changing the #ifdefs so that they point at CONFIG_HAVE_KVM_IRQFD.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c77dcacb
  10. 05 8月, 2014 5 次提交
  11. 05 5月, 2014 1 次提交
    • C
      kvm/irqchip: Speed up KVM_SET_GSI_ROUTING · 719d93cd
      Christian Borntraeger 提交于
      When starting lots of dataplane devices the bootup takes very long on
      Christian's s390 with irqfd patches. With larger setups he is even
      able to trigger some timeouts in some components.  Turns out that the
      KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
      when having multiple CPUs.  This is caused by the  synchronize_rcu and
      the HZ=100 of s390.  By changing the code to use a private srcu we can
      speed things up.  This patch reduces the boot time till mounting root
      from 8 to 2 seconds on my s390 guest with 100 disks.
      
      Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
      are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
      uses rcu_dereference_raw rather than rcu_dereference, and write-sides
      do not do rcu lockdep at all).
      
      Note that we're hardly relying on the "sleepable" part of srcu.  We just
      want SRCU's faster detection of grace periods.
      
      Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
      and RR.  The difference between results "before" and "after" the patch
      has mean -0.2% and standard deviation 0.6%.  Using a paired t-test on the
      data points says that there is a 2.5% probability that the patch is the
      cause of the performance difference (rather than a random fluctuation).
      
      (Restricting the t-test to RR, which is the most likely to be affected,
      changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
      probability that the numbers actually say something about the patch.
      The probability increases mostly because there are fewer data points).
      
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      719d93cd
  12. 18 4月, 2014 2 次提交
    • M
      KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1
      Michael S. Tsirkin 提交于
      With KVM, MMIO is much slower than PIO, due to the need to
      do page walk and emulation. But with EPT, it does not have to be: we
      know the address from the VMCS so if the address is unique, we can look
      up the eventfd directly, bypassing emulation.
      
      Unfortunately, this only works if userspace does not need to match on
      access length and data.  The implementation adds a separate FAST_MMIO
      bus internally. This serves two purposes:
          - minimize overhead for old userspace that does not use eventfd with lengtth = 0
          - minimize disruption in other code (since we don't know the length,
            devices on the MMIO bus only get a valid address in write, this
            way we don't need to touch all devices to teach them to handle
            an invalid length)
      
      At the moment, this optimization only has effect for EPT on x86.
      
      It will be possible to speed up MMIO for NPT and MMU using the same
      idea in the future.
      
      With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
      I was unable to detect any measureable slowdown to non-eventfd MMIO.
      
      Making MMIO faster is important for the upcoming virtio 1.0 which
      includes an MMIO signalling capability.
      
      The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
      pre-review and suggestions.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      68c3b4d1
    • M
      KVM: support any-length wildcard ioeventfd · f848a5a8
      Michael S. Tsirkin 提交于
      It is sometimes benefitial to ignore IO size, and only match on address.
      In hindsight this would have been a better default than matching length
      when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
      of access can be optimized on VMX: there no need to do page lookups.
      This can currently be done with many ioeventfds but in a suboptimal way.
      
      However we can't change kernel/userspace ABI without risk of breaking
      some applications.
      Use len = 0 to mean "ignore length for matching" in a more optimal way.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f848a5a8
  13. 19 3月, 2014 1 次提交
    • C
      KVM: eventfd: Fix lock order inversion. · 684a0b71
      Cornelia Huck 提交于
      When registering a new irqfd, we call its ->poll method to collect any
      event that might have previously been pending so that we can trigger it.
      This is done under the kvm->irqfds.lock, which means the eventfd's ctx
      lock is taken under it.
      
      However, if we get a POLLHUP in irqfd_wakeup, we will be called with the
      ctx lock held before getting the irqfds.lock to deactivate the irqfd,
      causing lockdep to complain.
      
      Calling the ->poll method does not really need the irqfds.lock, so let's
      just move it after we've given up the irqfds.lock in kvm_irqfd_assign().
      Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      684a0b71
  14. 04 9月, 2013 1 次提交
  15. 04 6月, 2013 1 次提交
    • A
      kvm: exclude ioeventfd from counting kvm_io_range limit · 6ea34c9b
      Amos Kong 提交于
      We can easily reach the 1000 limit by start VM with a couple
      hundred I/O devices (multifunction=on). The hardcode limit
      already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).
      
      In userspace, we already have maximum file descriptor to
      limit ioeventfd count. But kvm_io_bus devices also are used
      for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
      by maximum file descriptor.
      
      Currently only ioeventfds take too much kvm_io_bus devices,
      so just exclude it from counting kvm_io_range limit.
      
      Also fixed one indent issue in kvm_host.h
      Signed-off-by: NAmos Kong <akong@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      6ea34c9b
  16. 27 4月, 2013 1 次提交
  17. 16 4月, 2013 1 次提交
  18. 07 4月, 2013 1 次提交
  19. 06 3月, 2013 2 次提交
  20. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  21. 11 12月, 2012 1 次提交
  22. 06 12月, 2012 1 次提交
  23. 23 9月, 2012 1 次提交
    • A
      KVM: Add resampling irqfds for level triggered interrupts · 7a84428a
      Alex Williamson 提交于
      To emulate level triggered interrupts, add a resample option to
      KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
      the user when the irqchip has been resampled by the VM.  This may, for
      instance, indicate an EOI.  Also in this mode, posting of an interrupt
      through an irqfd only asserts the interrupt.  On resampling, the
      interrupt is automatically de-asserted prior to user notification.
      This enables level triggered interrupts to be posted and re-enabled
      from vfio with no userspace intervention.
      
      All resampling irqfds can make use of a single irq source ID, so we
      reserve a new one for this interface.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7a84428a
  24. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo 提交于
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  25. 03 7月, 2012 2 次提交