1. 10 6月, 2009 10 次提交
    • M
      KVM: unify part of generic timer handling · d3c7b77d
      Marcelo Tosatti 提交于
      Hide the internals of vcpu awakening / injection from the in-kernel
      emulated timers. This makes future changes in this logic easier and
      decreases the distance to more generic timer handling.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d3c7b77d
    • M
      KVM: PIT: remove usage of count_load_time for channel 0 · fd668423
      Marcelo Tosatti 提交于
      We can infer elapsed time from hrtimer_expires_remaining.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      fd668423
    • M
      KVM: PIT: remove unused scheduled variable · 5a05d545
      Marcelo Tosatti 提交于
      Unused.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5a05d545
    • M
      KVM: x86: paravirt skip pit-through-ioapic boot check · a90ede7b
      Marcelo Tosatti 提交于
      Skip the test which checks if the PIT is properly routed when
      using the IOAPIC, aimed at buggy hardware.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a90ede7b
    • M
      KVM: x86: silence preempt warning on kvm_write_guest_time · 2dea4c84
      Matt T. Yourst 提交于
      This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64)
      with PREEMPT enabled.
      
      We're getting syslog warnings like this many (but not all) times qemu
      tells KVM to run the VCPU:
      
      BUG: using smp_processor_id() in preemptible [00000000] code:
      qemu-system-x86/28938
      caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
      Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit
      Call Trace:
      debug_smp_processor_id+0xf7/0x100
      kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
      ? __wake_up+0x4e/0x70
      ? wake_futex+0x27/0x40
      kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm]
      enqueue_hrtimer+0x8a/0x110
      _spin_unlock_irqrestore+0x27/0x50
      vfs_ioctl+0x31/0xa0
      do_vfs_ioctl+0x74/0x480
      sys_futex+0xb4/0x140
      sys_ioctl+0x99/0xa0
      system_call_fastpath+0x16/0x1b
      
      As it turns out, the call trace is messed up due to gcc's inlining, but
      I isolated the problem anyway: kvm_write_guest_time() is being used in a
      non-thread-safe manner on preemptable kernels.
      
      Basically kvm_write_guest_time()'s body needs to be surrounded by
      preempt_disable() and preempt_enable(), since the kernel won't let us
      query any per-CPU data (indirectly using smp_processor_id()) without
      preemption disabled. The attached patch fixes this issue by disabling
      preemption inside kvm_write_guest_time().
      
      [marcelo: surround only __get_cpu_var calls since the warning
      is harmless]
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2dea4c84
    • S
      KVM: Enable MSI-X for KVM assigned device · d510d6cc
      Sheng Yang 提交于
      This patch finally enable MSI-X.
      
      What we need for MSI-X:
      1. Intercept one page in MMIO region of device. So that we can get guest desired
      MSI-X table and set up the real one. Now this have been done by guest, and
      transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
      
      2. Information for incoming interrupt. Now one device can have more than one
      interrupt, and they are all handled by one workqueue structure. So we need to
      identify them. The previous patch enable gsi_msg_pending_bitmap get this done.
      
      3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
      message address/data. We used same entry number for the host and guest here, so
      that it's easy to find the correlated guest gsi.
      
      What we lack for now:
      1. The PCI spec said nothing can existed with MSI-X table in the same page of
      MMIO region, except pending bits. The patch ignore pending bits as the first
      step (so they are always 0 - no pending).
      
      2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
      can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
      didn't support this, and Linux also don't work in this way.
      
      3. The patch didn't implement MSI-X mask all and mask single entry. I would
      implement the former in driver/pci/msi.c later. And for single entry, userspace
      should have reposibility to handle it.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d510d6cc
    • S
      KVM: bit ops for deliver_bitmap · bfd349d0
      Sheng Yang 提交于
      It's also convenient when we extend KVM supported vcpu number in the future.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      bfd349d0
    • S
      KVM: Update intr delivery func to accept unsigned long* bitmap · 110c2fae
      Sheng Yang 提交于
      Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is
      increased.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      110c2fae
    • A
      KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE · 5897297b
      Avi Kivity 提交于
      Windows 2008 accesses this MSR often on context switch intensive workloads;
      since we run in guest context with the guest MSR value loaded (so swapgs can
      work correctly), we can simply disable interception of rdmsr/wrmsr for this
      MSR.
      
      A complication occurs since in legacy mode, we run with the host MSR value
      loaded. In this case we enable interception.  This means we need two MSR
      bitmaps, one for legacy mode and one for long mode.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5897297b
    • A
      KVM: VMX: Don't use highmem pages for the msr and pio bitmaps · 3e7c73e9
      Avi Kivity 提交于
      Highmem pages are a pain, and saving three lowmem pages on i386 isn't worth
      the extra code.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3e7c73e9
  2. 09 6月, 2009 1 次提交
  3. 08 6月, 2009 5 次提交
  4. 06 6月, 2009 1 次提交
  5. 05 6月, 2009 1 次提交
  6. 04 6月, 2009 3 次提交
  7. 03 6月, 2009 3 次提交
  8. 02 6月, 2009 1 次提交
  9. 30 5月, 2009 2 次提交
  10. 29 5月, 2009 5 次提交
    • M
      x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not · 32b154c0
      Mel Gorman 提交于
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302
      
      On x86 and x86-64, it is possible that page tables are shared beween
      shared mappings backed by hugetlbfs.  As part of this,
      page_table_shareable() checks a pair of vma->vm_flags and they must match
      if they are to be shared.  All VMA flags are taken into account, including
      VM_LOCKED.
      
      The problem is that VM_LOCKED is cleared on fork().  When a process with a
      shared memory segment forks() to exec() a helper, there will be shared
      VMAs with different flags.  The impact is that the shared segment is
      sometimes considered shareable and other times not, depending on what
      process is checking.
      
      What happens is that the segment page tables are being shared but the
      count is inaccurate depending on the ordering of events.  As the page
      tables are freed with put_page(), bad pmd's are found when some of the
      children exit.  The hugepage counters also get corrupted and the Total and
      Free count will no longer match even when all the hugepage-backed regions
      are freed.  This requires a reboot of the machine to "fix".
      
      This patch addresses the problem by comparing all flags except VM_LOCKED
      when deciding if pagetables should be shared or not for hugetlbfs-backed
      mapping.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <starlight@binnacle.cx>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32b154c0
    • O
      flat: fix data sections alignment · c3dc5bec
      Oskar Schirmer 提交于
      The flat loader uses an architecture's flat_stack_align() to align the
      stack but assumes word-alignment is enough for the data sections.
      
      However, on the Xtensa S6000 we have registers up to 128bit width
      which can be used from userspace and therefor need userspace stack and
      data-section alignment of at least this size.
      
      This patch drops flat_stack_align() and uses the same alignment that
      is required for slab caches, ARCH_SLAB_MINALIGN, or wordsize if it's
      not defined by the architecture.
      
      It also fixes m32r which was obviously kaput, aligning an
      uninitialized stack entry instead of the stack pointer.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NOskar Schirmer <os@emlix.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Bryan Wu <cooloney@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Signed-off-by: NJohannes Weiner <jw@emlix.com>
      Acked-by: NMike Frysinger <vapier.adi@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3dc5bec
    • R
      [ARM] update mach-types · 6daad5c6
      Russell King 提交于
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      6daad5c6
    • M
      [ARM] Add cmpxchg support for ARMv6+ systems (v5) · ecd322c9
      Mathieu Desnoyers 提交于
      Add cmpxchg/cmpxchg64 support for ARMv6K and ARMv7 systems
      (original patch from Catalin Marinas <catalin.marinas@arm.com>)
      
      The cmpxchg and cmpxchg64 functions can be implemented using the
      LDREX*/STREX* instructions. Since operand lengths other than 32bit are
      required, the full implementations are only available if the ARMv6K
      extensions are present (for the LDREXB, LDREXH and LDREXD instructions).
      
      For ARMv6, only 32-bits cmpxchg is available.
      
      Mathieu :
      
      Make cmpxchg_local always available with best implementation for all type sizes (1, 2, 4 bytes).
      Make cmpxchg64_local always available.
      
      Use "Ir" constraint for "old" operand, like atomic.h atomic_cmpxchg does.
      
      Change since v3 :
      - Add "memory" clobbers (thanks to Nicolas Pitre)
      - removed __asmeq(), only needed for old compilers, very unlikely on ARMv6+.
      
      Note : ARMv7-M should eventually be ifdefed-out of cmpxchg64. But it's not
      supported by the Linux kernel currently.
      
      Put back arm < v6 cmpxchg support.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      CC: Catalin Marinas <catalin.marinas@arm.com>
      CC: Nicolas Pitre <nico@cam.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      ecd322c9
    • R
      [ARM] barriers: improve xchg, bitops and atomic SMP barriers · bac4e960
      Russell King 提交于
      Mathieu Desnoyers pointed out that the ARM barriers were lacking:
      
      - cmpxchg, xchg and atomic add return need memory barriers on
        architectures which can reorder the relative order in which memory
        read/writes can be seen between CPUs, which seems to include recent
        ARM architectures. Those barriers are currently missing on ARM.
      
      - test_and_xxx_bit were missing SMP barriers.
      
      So put these barriers in.  Provide separate atomic_add/atomic_sub
      operations which do not require barriers.
      Reported-Reviewed-and-Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      bac4e960
  11. 28 5月, 2009 1 次提交
  12. 27 5月, 2009 7 次提交