1. 18 11月, 2012 8 次提交
  2. 17 11月, 2012 1 次提交
    • T
      KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update() · 29282fde
      Takashi Iwai 提交于
      The commit [ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with
      EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL,
      and this triggers kernel warnings like below on old CPUs:
      
          vmwrite error: reg 401e value a0568000 (err 12)
          Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154
          Call Trace:
           [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel]
           [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel]
           [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel]
           [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm]
           [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm]
           [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110
           [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel]
           [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm]
           [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm]
           [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm]
           [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm]
           [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530
           [<ffffffff81139d76>] ? remove_vma+0x56/0x60
           [<ffffffff8113b708>] ? do_munmap+0x328/0x400
           [<ffffffff81187c8c>] ? fget_light+0x4c/0x100
           [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0
           [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f
      
      This patch adds a check for the availability of secondary exec
      control to avoid these warnings.
      
      Cc: <stable@vger.kernel.org> [v3.6+]
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      29282fde
  3. 13 11月, 2012 1 次提交
    • P
      KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) · 6d1068b3
      Petr Matousek 提交于
      On hosts without the XSAVE support unprivileged local user can trigger
      oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
      cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
      ioctl.
      
      invalid opcode: 0000 [#2] SMP
      Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
      ...
      Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
      EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
      EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
      EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
      ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
      task.ti=d7c62000)
      Stack:
       00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
       ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
       c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
      Call Trace:
       [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
      ...
       [<c12bfb44>] ? syscall_call+0x7/0xb
      Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
      1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
      d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
      EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
      0068:d7c63e70
      
      QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
      and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
      out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
      X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
      X86_FEATURE_XSAVE even on hosts that do not support it, might be
      susceptible to this attack from inside the guest as well.
      
      Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
      Signed-off-by: NPetr Matousek <pmatouse@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      6d1068b3
  4. 04 11月, 2012 1 次提交
    • J
      xen/hypercall: fix hypercall fallback code for very old hypervisors · cf47a83f
      Jan Beulich 提交于
      While copying the argument structures in HYPERVISOR_event_channel_op()
      and HYPERVISOR_physdev_op() into the local variable is sufficiently
      safe even if the actual structure is smaller than the container one,
      copying back eventual output values the same way isn't: This may
      collide with on-stack variables (particularly "rc") which may change
      between the first and second memcpy() (i.e. the second memcpy() could
      discard that change).
      
      Move the fallback code into out-of-line functions, and handle all of
      the operations known by this old a hypervisor individually: Some don't
      require copying back anything at all, and for the rest use the
      individual argument structures' sizes rather than the container's.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      [v2: Reduce #define/#undef usage in HYPERVISOR_physdev_op_compat().]
      [v3: Fix compile errors when modules use said hypercalls]
      [v4: Add xen_ prefix to the HYPERCALL_..]
      [v5: Alter the name and only EXPORT_SYMBOL_GPL one of them]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cf47a83f
  5. 01 11月, 2012 2 次提交
    • X
      KVM: x86: fix vcpu->mmio_fragments overflow · 87da7e66
      Xiao Guangrong 提交于
      After commit b3356bf0 (KVM: emulator: optimize "rep ins" handling),
      the pieces of io data can be collected and write them to the guest memory
      or MMIO together
      
      Unfortunately, kvm splits the mmio access into 8 bytes and store them to
      vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
      will cause vcpu->mmio_fragments overflow
      
      The bug can be exposed by isapc (-M isapc):
      
      [23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ ......]
      [23154.858083] Call Trace:
      [23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
      [23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
      [23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
      
      Actually, we can use one mmio_fragment to store a large mmio access then
      split it when we pass the mmio-exit-info to userspace. After that, we only
      need two entries to store mmio info for the cross-mmio pages access
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      87da7e66
    • K
      xen/mmu: Use Xen specific TLB flush instead of the generic one. · 95a7d768
      Konrad Rzeszutek Wilk 提交于
      As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
      hypervisor to do a TLB flush on all active vCPUs. If instead
      we were using the generic one (which ends up being xen_flush_tlb)
      we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
      before we make that hypercall the kernel will IPI all of the
      vCPUs (even those that were asleep from the hypervisor
      perspective). The end result is that we needlessly wake them
      up and do a TLB flush when we can just let the hypervisor
      do it correctly.
      
      This patch gives around 50% speed improvement when migrating
      idle guest's from one host to another.
      
      Oracle-bug: 14630170
      
      CC: stable@vger.kernel.org
      Tested-by: NJingjie Jiang <jingjie.jiang@oracle.com>
      Suggested-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      95a7d768
  6. 30 10月, 2012 1 次提交
  7. 26 10月, 2012 2 次提交
  8. 25 10月, 2012 3 次提交
  9. 24 10月, 2012 13 次提交
    • D
      x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt · 94777fc5
      Dimitri Sivanich 提交于
      Posting this patch to fix an issue concerning sparse irq's that
      I raised a while back.  There was discussion about adding
      refcounting to sparse irqs (to fix other potential race
      conditions), but that does not appear to have been addressed
      yet.  This covers the only issue of this type that I've
      encountered in this area.
      
      A NULL pointer dereference can occur in
      smp_irq_move_cleanup_interrupt() if we haven't yet setup the
      irq_cfg pointer in the irq_desc.irq_data.chip_data.
      
      In create_irq_nr() there is a window where we have set
      vector_irq in __assign_irq_vector(), but not yet called
      irq_set_chip_data() to set the irq_cfg pointer.
      
      Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during
      this time, smp_irq_move_cleanup_interrupt() will attempt to
      process the aforementioned irq, but panic when accessing
      irq_cfg.
      
      Only continue processing the irq if irq_cfg is non-NULL.
      Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Joerg Roedel <joerg.roedel@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Link: http://lkml.kernel.org/r/20121016125021.GA22935@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      94777fc5
    • W
      perf/x86: Remove unused variable in nhmex_rbox_alter_er() · 64dfab8e
      Wei Yongjun 提交于
      The variable port is initialized but never used
      otherwise, so remove the unused variable.
      
      dpatch engine is used to auto generate this patch.
      (https://github.com/weiyj/dpatch)
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Link: http://lkml.kernel.org/r/CAPgLHd8NZkYSkZm22FpZxiEh6HcA0q-V%3D29vdnheiDhgrJZ%2Byw@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      64dfab8e
    • M
      x86/efi: Fix oops caused by incorrect set_memory_uc() usage · 3e8fa263
      Matt Fleming 提交于
      Calling __pa() with an ioremap'd address is invalid. If we
      encounter an efi_memory_desc_t without EFI_MEMORY_WB set in
      ->attribute we currently call set_memory_uc(), which in turn
      calls __pa() on a potentially ioremap'd address.
      
      On CONFIG_X86_32 this results in the following oops:
      
        BUG: unable to handle kernel paging request at f7f22280
        IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
        *pdpt = 0000000001978001 *pde = 0000000001ffb067 *pte = 0000000000000000
        Oops: 0000 [#1] PREEMPT SMP
        Modules linked in:
      
        Pid: 0, comm: swapper Not tainted 3.0.0-acpi-efi-0805 #3
         EIP: 0060:[<c10257b9>] EFLAGS: 00010202 CPU: 0
         EIP is at reserve_ram_pages_type+0x89/0x210
         EAX: 0070e280 EBX: 38714000 ECX: f7814000 EDX: 00000000
         ESI: 00000000 EDI: 38715000 EBP: c189fef0 ESP: c189fea8
         DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
        Process swapper (pid: 0, ti=c189e000 task=c18bbe60 task.ti=c189e000)
        Stack:
         80000200 ff108000 00000000 c189ff00 00038714 00000000 00000000 c189fed0
         c104f8ca 00038714 00000000 00038715 00000000 00000000 00038715 00000000
         00000010 38715000 c189ff48 c1025aff 38715000 00000000 00000010 00000000
        Call Trace:
         [<c104f8ca>] ? page_is_ram+0x1a/0x40
         [<c1025aff>] reserve_memtype+0xdf/0x2f0
         [<c1024dc9>] set_memory_uc+0x49/0xa0
         [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
         [<c19216d4>] start_kernel+0x291/0x2f2
         [<c19211c7>] ? loglevel+0x1b/0x1b
         [<c19210bf>] i386_start_kernel+0xbf/0xc8
      
      The only time we can call set_memory_uc() for a memory region is
      when it is part of the direct kernel mapping. For the case where
      we ioremap a memory region we must leave it alone.
      
      This patch reimplements the fix from e8c71062 ("x86, efi:
      Calling __pa() with an ioremap()ed address is invalid") which
      was reverted in e1ad783b because it caused a regression on
      some MacBooks (they hung at boot). The regression was caused
      because the commit only marked EFI_RUNTIME_SERVICES_DATA as
      E820_RESERVED_EFI, when it should have marked all regions that
      have the EFI_MEMORY_RUNTIME attribute.
      
      Despite first impressions, it's not possible to use
      ioremap_cache() to map all cached memory regions on
      CONFIG_X86_64 because of the way that the memory map might be
      configured as detailed in the following bug report,
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=748516
      
      e.g. some of the EFI memory regions *need* to be mapped as part
      of the direct kernel mapping.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Huang Ying <huang.ying.caritas@gmail.com>
      Cc: Keith Packard <keithp@keithp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1350649546-23541-1-git-send-email-matt@console-pimps.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3e8fa263
    • V
      perf/x86: Enable overflow on Intel KNC with a custom knc_pmu_handle_irq() · e4074b30
      Vince Weaver 提交于
      Although based on the Intel P6 design, the interrupt mechnanism
      for KNC more closely resembles the Intel architectural
      perfmon one.
      
      We can't just re-use that code though, because KNC has different
      MSR numbers for the status and ack registers.
      
      In this case we just cut-and paste from perf_event_intel.c
      with some minor changes, as it looks like it would not be
      worth the trouble to change that code to be MSR-configurable.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: eranian@gmail.com
      Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171304410.23243@vincent-weaver-1.um.maine.edu
      [ Small stylistic edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e4074b30
    • V
      perf/x86: Remove cpuc->enable check on Intl KNC event enable/disable · 7d011962
      Vince Weaver 提交于
      x86_pmu.enable() is called from x86_pmu_enable() with
      cpuc->enabled set to 0.  This means we weren't re-enabling the
      counters after a context switch.
      
      This patch just removes the check, as it should't be necessary
      (and the equivelent x86_ generic code does not have the checks).
      
      The origin of this problem is the KNC driver being based on the
      P6 one.   The P6 driver also has this issue, but works anyway
      due to various lucky accidents.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: eranian@gmail.com
      Cc: Meadows
      Cc: Lawrence F <lawrence.f.meadows@intel.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171303290.23243@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7d011962
    • V
      perf/x86: Make Intel KNC use full 40-bit width of counters · ae5ba47a
      Vince Weaver 提交于
      Early versions of Intel KNC chips have a bug where bits above 32
      were not properly set.  We worked around this by only using the
      bottom 32 bits (out of 40 that should be available).
      
      It turns out this workaround breaks overflow handling.
      
      The buggy silicon will in theory never be used in production
      systems, so remove this workaround so we get proper overflow
      support.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: eranian@gmail.com
      Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171302140.23243@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ae5ba47a
    • Y
      perf/x86/uncore: Handle pci_read_config_dword() errors · 032c3851
      Yan, Zheng 提交于
      This, beyond handling corner cases, also fixes some build warnings:
      
       arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_disable_box’:
       arch/x86/kernel/cpu/perf_event_intel_uncore.c:124:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
       arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_enable_box’:
       arch/x86/kernel/cpu/perf_event_intel_uncore.c:135:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
       arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_read_counter’:
       arch/x86/kernel/cpu/perf_event_intel_uncore.c:164:2: warning: ‘count’ is used uninitialized in this function [-Wuninitialized]
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Cc: a.p.zijlstra@chello.nl
      Link: http://lkml.kernel.org/r/1351068140-13456-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      032c3851
    • J
      x86-64: Fix page table accounting · 876ee61a
      Jan Beulich 提交于
      Commit 20167d34 ("x86-64: Fix
      accounting in kernel_physical_mapping_init()") went a little too
      far by entirely removing the counting of pre-populated page
      tables: this should be done at boot time (to cover the page
      tables set up in early boot code), but shouldn't be done during
      memory hot add.
      
      Hence, re-add the removed increments of "pages", but make them
      and the one in phys_pte_init() conditional upon !after_bootmem.
      Reported-Acked-and-Tested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/506DAFBA020000780009FA8C@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      876ee61a
    • V
      perf/x86: Remove P6 cpuc->enabled check · 58e9eaf0
      Vince Weaver 提交于
      Between 2.6.33 and 2.6.34 the PMU code was made modular.
      
      The x86_pmu_enable() call was extended to disable cpuc->enabled
      and iterate the counters, enabling one at a time, before calling
      enable_all() at the end, followed by re-enabling cpuc->enabled.
      
      Since cpuc->enabled was set to 0, that change effectively caused
      the "val |= ARCH_PERFMON_EVENTSEL_ENABLE;" code in p6_pmu_enable_event()
      and p6_pmu_disable_event() to be dead code that was never called.
      
      This change removes this code (which was confusing) and adds some
      extra commentary to make it more clear what is going on.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191732000.14552@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      58e9eaf0
    • V
      perf/x86: Update/fix generic events on P6 PMU · e09df478
      Vince Weaver 提交于
      This patch updates the generic events on p6, including some new
      extended cache events.
      
      Values for these events were taken from the equivelant PAPI
      predefined events.
      
      Tested on a Pentium II.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191730080.14552@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e09df478
    • V
      perf/x86: Fix P6 FP_ASSIST event constraint · 7991c9ca
      Vince Weaver 提交于
      According to Intel SDM Volume 3B, FP_ASSIST is limited to Counter 1 only,
      not Counter 0.
      
      Tested on a Pentium II.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191728570.14552@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7991c9ca
    • D
      Revert "x86/mm: Fix the size calculation of mapping tables" · 7b16bbf9
      Dave Young 提交于
      Commit:
      
         722bc6b1 x86/mm: Fix the size calculation of mapping tables
      
      Tried to address the issue that the first 2/4M should use 4k pages
      if PSE enabled, but extra counts should only be valid for x86_32.
      
      This commit caused a kdump regression: the kdump kernel hangs.
      
      Work is in progress to fundamentally fix the various page table
      initialization issues that we have, via the design suggested
      by H. Peter Anvin, but it's not ready yet to be merged.
      
      So, to get a working kdump revert to the last known working version,
      which is the revert of this commit and of a followup fix (which was
      incomplete):
      
         bd2753b2 x86/mm: Only add extra pages count for the first memory range during pre-allocation
      
      Tested kdump on physical and virtual machines.
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NFlavio Leitner <fbl@redhat.com>
      Tested-by: NFlavio Leitner <fbl@redhat.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Flavio Leitner <fbl@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: ianfang.cn@gmail.com
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7b16bbf9
    • A
      x86/perf: Fix virtualization sanity check · bffd5fc2
      Andre Przywara 提交于
      In check_hw_exists() we try to detect non-emulated MSR accesses
      by writing an arbitrary value into one of the PMU registers
      and check if it's value after a readout is still the same.
      This algorithm silently assumes that the register does not contain
      the magic value already, which is wrong in at least one situation.
      
      Fix the algorithm to really do a read-modify-write cycle. This fixes
      a warning under Xen under some circumstances on AMD family 10h CPUs.
      
      The reasons in more details actually sound like a story from
      Believe It or Not!:
      
      First you need an AMD family 10h/12h CPU. These do not reset the
      PERF_CTR registers on a reboot.
      Now you boot bare metal Linux, which goes successfully through this
      check, but leaves the magic value of 0xabcd in the register. You
      don't use the performance counters, but do a reboot (warm reset).
      Then you choose to boot Xen. The check will be triggered with a
      recent Linux kernel as Dom0 again, trying to write 0xabcd into the
      MSR. Xen silently drops the write (expected), but the subsequent read
      will return the value in the register, which just happens to be the
      expected magic value. Thus the test misleadingly succeeds, leaving
      the kernel in the belief that the PMU is available. This will trigger
      the following message:
      
      [    0.020294] ------------[ cut here ]------------
      [    0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
      [    0.020318] Hardware name: empty
      [    0.020323] Modules linked in:
      [    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
      [    0.020340] Call Trace:
      [    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
      [    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
      [    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
      [    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
      [    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
      [    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
      [    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
      [    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
      [    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
      [    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
      [    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
      [    0.020500] ---[ end trace a7919e7f17c0a725 ]---
      
      The new code will change every of the 16 low bits read from the
      register and tries to write and read-back that modified number
      from the MSR.
      Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bffd5fc2
  10. 23 10月, 2012 3 次提交
    • S
      KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT · c5e015d4
      Sasha Levin 提交于
      KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
      marked that spot as an exit from idleness.
      
      Not doing so can cause RCU warnings such as:
      
      [  732.788386] ===============================
      [  732.789803] [ INFO: suspicious RCU usage. ]
      [  732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G        W
      [  732.790032] -------------------------------
      [  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
      [  732.790032]
      [  732.790032] other info that might help us debug this:
      [  732.790032]
      [  732.790032]
      [  732.790032] RCU used illegally from idle CPU!
      [  732.790032] rcu_scheduler_active = 1, debug_locks = 1
      [  732.790032] RCU used illegally from extended quiescent state!
      [  732.790032] 2 locks held by trinity-child31/8252:
      [  732.790032]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0
      [  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200
      [  732.790032]
      [  732.790032] stack backtrace:
      [  732.790032] Pid: 8252, comm: trinity-child31 Tainted: G        W    3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
      [  732.790032] Call Trace:
      [  732.790032]  [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120
      [  732.790032]  [<ffffffff81152c60>] cpuacct_charge+0x90/0x200
      [  732.790032]  [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200
      [  732.790032]  [<ffffffff81158093>] update_curr+0x1a3/0x270
      [  732.790032]  [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210
      [  732.790032]  [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130
      [  732.790032]  [<ffffffff8114ae29>] dequeue_task+0x89/0xa0
      [  732.790032]  [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20
      [  732.790032]  [<ffffffff83a67c29>] __schedule+0x879/0x8f0
      [  732.790032]  [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10
      [  732.790032]  [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0
      [  732.790032]  [<ffffffff83a67cf5>] schedule+0x55/0x60
      [  732.790032]  [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0
      [  732.790032]  [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0
      [  732.790032]  [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90
      [  732.790032]  [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0
      [  732.790032]  [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NGleb Natapov <gleb@redhat.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c5e015d4
    • G
      7f46ddbd
    • X
      KVM: MMU: fix release noslot pfn · f3ac1a4b
      Xiao Guangrong 提交于
      We can not directly call kvm_release_pfn_clean to release the pfn
      since we can meet noslot pfn which is used to cache mmio info into
      spte
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f3ac1a4b
  11. 20 10月, 2012 5 次提交