1. 21 2月, 2019 1 次提交
    • B
      kvm: Add memcg accounting to KVM allocations · b12ce36a
      Ben Gardon 提交于
      There are many KVM kernel memory allocations which are tied to the life of
      the VM process and should be charged to the VM process's cgroup. If the
      allocations aren't tied to the process, the OOM killer will not know
      that killing the process will free the associated kernel memory.
      Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
      charged to the VM process's cgroup.
      
      Tested:
      	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
      	introduced no new failures.
      	Ran a kernel memory accounting test which creates a VM to touch
      	memory and then checks that the kernel memory allocated for the
      	process is within certain bounds.
      	With this patch we account for much more of the vmalloc and slab memory
      	allocated for the VM.
      
      There remain a few allocations which should be charged to the VM's
      cgroup but are not. In they include:
              vcpu->run
              kvm->coalesced_mmio_ring
      There allocations are unaccounted in this patch because they are mapped
      to userspace, and accounting them to a cgroup causes problems. This
      should be addressed in a future patch.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b12ce36a
  2. 18 7月, 2018 2 次提交
  3. 26 5月, 2018 1 次提交
  4. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  5. 28 11月, 2017 2 次提交
  6. 19 9月, 2017 1 次提交
  7. 15 9月, 2017 1 次提交
  8. 07 7月, 2017 1 次提交
    • C
      KVM: mark kvm->busses as rcu protected · 4a12f951
      Christian Borntraeger 提交于
      mark kvm->busses as rcu protected and use the correct access
      function everywhere.
      
      found by sparse
      virt/kvm/kvm_main.c:3490:15: error: incompatible types in comparison expression (different address spaces)
      virt/kvm/kvm_main.c:3509:15: error: incompatible types in comparison expression (different address spaces)
      virt/kvm/kvm_main.c:3561:15: error: incompatible types in comparison expression (different address spaces)
      virt/kvm/kvm_main.c:3644:15: error: incompatible types in comparison expression (different address spaces)
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      4a12f951
  9. 20 6月, 2017 1 次提交
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  10. 13 4月, 2017 1 次提交
  11. 24 3月, 2017 1 次提交
  12. 26 10月, 2016 1 次提交
    • P
      KVM: fix OOPS on flush_work · 36343f6e
      Paolo Bonzini 提交于
      The conversion done by commit 3706feac ("KVM: Remove deprecated
      create_singlethread_workqueue") is broken.  It flushes a single work
      item &irqfd->shutdown instead of all of them, and even worse if there
      is no irqfd on the list then you get a NULL pointer dereference.
      Revert the virt/kvm/eventfd.c part of that patch; to avoid the
      deprecated function, just allocate our own workqueue---it does
      not even have to be unbound---with alloc_workqueue.
      
      Fixes: 3706feacReviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      36343f6e
  13. 08 9月, 2016 1 次提交
    • B
      KVM: Remove deprecated create_singlethread_workqueue · 3706feac
      Bhaktipriya Shridhar 提交于
      The workqueue "irqfd_cleanup_wq" queues a single work item
      &irqfd->shutdown and hence doesn't require ordering. It is a host-wide
      workqueue for issuing deferred shutdown requests aggregated from all
      vm* instances. It is not being used on a memory reclaim path.
      Hence, it has been converted to use system_wq.
      The work item has been flushed in kvm_irqfd_release().
      
      The workqueue "wqueue" queues a single work item &timer->expired
      and hence doesn't require ordering. Also, it is not being used on
      a memory reclaim path. Hence, it has been converted to use system_wq.
      
      System workqueues have been able to handle high level of concurrency
      for a long time now and hence it's not required to have a singlethreaded
      workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
      created with create_singlethread_workqueue(), system_wq allows multiple
      work items to overlap executions even on the same CPU; however, a
      per-cpu workqueue doesn't have any CPU locality or global ordering
      guarantee unless the target CPU is explicitly specified and thus the
      increase of local concurrency shouldn't make any difference.
      Signed-off-by: NBhaktipriya Shridhar <bhaktipriya96@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3706feac
  14. 12 5月, 2016 1 次提交
  15. 04 11月, 2015 1 次提交
  16. 16 10月, 2015 3 次提交
  17. 01 10月, 2015 5 次提交
  18. 15 9月, 2015 3 次提交
    • J
      kvm: fix double free for fast mmio eventfd · eefd6b06
      Jason Wang 提交于
      We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS
      and once on KVM_FAST_MMIO_BUS but with a single iodev
      instance. This will lead to an issue: kvm_io_bus_destroy() knows
      nothing about the devices on two buses pointing to a single dev. Which
      will lead to double free[1] during exit. Fix this by allocating two
      instances of iodevs then registering one on KVM_MMIO_BUS and another
      on KVM_FAST_MMIO_BUS.
      
      CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
      Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
      task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
      RIP: 0010:[<ffffffffc07e25d8>]  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
      RSP: 0018:ffff88020e7f3bc8  EFLAGS: 00010292
      RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
      RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
      RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
      R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
      R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
      FS:  00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
      Stack:
      ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
      ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
       0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
      Call Trace:
      [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
      [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
      [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
      [<ffffffff811f69f7>] __fput+0xe7/0x250
      [<ffffffff811f6bae>] ____fput+0xe/0x10
      [<ffffffff81093f04>] task_work_run+0xd4/0xf0
      [<ffffffff81079358>] do_exit+0x368/0xa50
      [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
      [<ffffffff81079ad5>] do_group_exit+0x45/0xb0
      [<ffffffff81085c71>] get_signal+0x291/0x750
      [<ffffffff810144d8>] do_signal+0x28/0xab0
      [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
      [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
      [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
      [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
      [<ffffffff817cb9af>] int_signal+0x12/0x17
      Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
       RIP  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
       RSP <ffff88020e7f3bc8>
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eefd6b06
    • J
      kvm: factor out core eventfd assign/deassign logic · 85da11ca
      Jason Wang 提交于
      This patch factors out core eventfd assign/deassign logic and leaves
      the argument checking and bus index selection to callers.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85da11ca
    • J
      kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd · 8453fecb
      Jason Wang 提交于
      We only want zero length mmio eventfd to be registered on
      KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to
      make sure this.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8453fecb
  19. 27 3月, 2015 2 次提交
  20. 12 3月, 2015 1 次提交
  21. 22 11月, 2014 1 次提交
  22. 24 9月, 2014 1 次提交
  23. 11 9月, 2014 1 次提交
  24. 06 8月, 2014 1 次提交
    • P
      KVM: Move more code under CONFIG_HAVE_KVM_IRQFD · c77dcacb
      Paolo Bonzini 提交于
      Commits e4d57e1e (KVM: Move irq notifier implementation into
      eventfd.c, 2014-06-30) included the irq notifier code unconditionally
      in eventfd.c, while it was under CONFIG_HAVE_KVM_IRQCHIP before.
      
      Similarly, commit 297e2105 (KVM: Give IRQFD its own separate enabling
      Kconfig option, 2014-06-30) moved code from CONFIG_HAVE_IRQ_ROUTING
      to CONFIG_HAVE_KVM_IRQFD but forgot to move the pieces that used to be
      under CONFIG_HAVE_KVM_IRQCHIP.
      
      Together, this broke compilation without CONFIG_KVM_XICS.  Fix by adding
      or changing the #ifdefs so that they point at CONFIG_HAVE_KVM_IRQFD.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c77dcacb
  25. 05 8月, 2014 5 次提交