提交 · afd80d85aefac27e6e2f9dc10f60515357c504d2 · openeuler / Kernel

02 4月, 2013 1 次提交

pmu: prepare for migration support · afd80d85

由 Paolo Bonzini 提交于 3月 28, 2013

In order to migrate the PMU state correctly, we need to restore the
values of MSR_CORE_PERF_GLOBAL_STATUS (a read-only register) and
MSR_CORE_PERF_GLOBAL_OVF_CTRL (which has side effects when written).
We also need to write the full 40-bit value of the performance counter,
which would only be possible with a v3 architectural PMU's full-width
counter MSRs.

To distinguish host-initiated writes from the guest's, pass the
full struct msr_data to kvm_pmu_set_msr.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

afd80d85

22 3月, 2013 2 次提交

KVM: MMU: Rename kvm_mmu_free_some_pages() to make_mmu_pages_available() · 81f4f76b

由 Takuya Yoshikawa 提交于 3月 21, 2013

The current name "kvm_mmu_free_some_pages" should be used for something
that actually frees some shadow pages, as we expect from the name, but
what the function is doing is to make some, KVM_MIN_FREE_MMU_PAGES,
shadow pages available: it does nothing when there are enough.

This patch changes the name to reflect this meaning better; while doing
this renaming, the code in the wrapper function is inlined into the main
body since the whole function will be inlined into the only caller now.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

81f4f76b

KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page() · 7ddca7e4

由 Takuya Yoshikawa 提交于 3月 21, 2013

What this function is doing is to ensure that the number of shadow pages
does not exceed the maximum limit stored in n_max_mmu_pages: so this is
placed at every code path that can reach kvm_mmu_alloc_page().

Although it might have some sense to spread this function in each such
code path when it could be called before taking mmu_lock, the rule was
changed not to do so.

Taking this background into account, this patch moves it into
kvm_mmu_alloc_page() and simplifies the code.

Note: the unlikely hint in kvm_mmu_free_some_pages() guarantees that the
overhead of this function is almost zero except when we actually need to
allocate some shadow pages, so we do not need to care about calling it
multiple times in one path by doing kvm_mmu_get_page() a few times.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7ddca7e4

21 3月, 2013 1 次提交

KVM: x86: correctly initialize the CS base on reset · 04b66839

由 Paolo Bonzini 提交于 3月 19, 2013

The CS base was initialized to 0 on VMX (wrong, but usually overridden
by userspace before starting) or 0xf0000 on SVM.  The correct value is
0xffff0000, and VMX is able to emulate it now, so use it.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

04b66839

20 3月, 2013 2 次提交

KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) · 0b79459b

由 Andy Honig 提交于 2月 20, 2013

There is a potential use after free issue with the handling of
MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable
memory such as frame buffers then KVM might continue to write to that
address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins
the page in memory so it's unlikely to cause an issue, but if the user
space component re-purposes the memory previously used for the guest, then
the guest will be able to corrupt that memory.

Tested: Tested against kvmclock unit test
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0b79459b

KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) · c300aa64

由 Andy Honig 提交于 3月 11, 2013

If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page.  The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls.  Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.

Tested: Tested against kvmclock unit test.
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c300aa64

19 3月, 2013 2 次提交

KVM: x86: fix deadlock in clock-in-progress request handling · c09664bb

由 Marcelo Tosatti 提交于 3月 18, 2013

There is a deadlock in pvclock handling:

cpu0:                                               cpu1:
kvm_gen_update_masterclock()
                                              kvm_guest_time_update()
 spin_lock(pvclock_gtod_sync_lock)
                                               local_irq_save(flags)

spin_lock(pvclock_gtod_sync_lock)

 kvm_make_mclock_inprogress_request(kvm)
  make_all_cpus_request()
   smp_call_function_many()

Now if smp_call_function_many() called by cpu0 tries to call function on
cpu1 there will be a deadlock.

Fix by moving pvclock_gtod_sync_lock protected section outside irq
disabled section.

Analyzed by Gleb Natapov <gleb@redhat.com>
Acked-by: NGleb Natapov <gleb@redhat.com>
Reported-and-Tested-by: NYongjie Ren <yongjie.ren@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c09664bb

KVM: VMX: Require KVM_SET_TSS_ADDR being called prior to running a VCPU · 4918c6ca

由 Jan Kiszka 提交于 3月 15, 2013

Very old user space (namely qemu-kvm before kvm-49) didn't set the TSS
base before running the VCPU. We always warned about this bug, but no
reports about users actually seeing this are known. Time to finally
remove the workaround that effectively prevented to call vmx_vcpu_reset
while already holding the KVM srcu lock.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4918c6ca

18 3月, 2013 1 次提交

perf,x86: fix wrmsr_on_cpu() warning on suspend/resume · 2a6e06b2

由 Linus Torvalds 提交于 3月 17, 2013

Commit 1d9d8639 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.

init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.

This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: NParag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a6e06b2

16 3月, 2013 1 次提交

perf,x86: fix kernel crash with PEBS/BTS after suspend/resume · 1d9d8639

由 Stephane Eranian 提交于 3月 15, 2013

This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d9d8639

14 3月, 2013 4 次提交

KVM: x86: Optimize mmio spte zapping when creating/moving memslot · 982b3394

由 Takuya Yoshikawa 提交于 3月 12, 2013

When we create or move a memory slot, we need to zap mmio sptes.
Currently, zap_all() is used for this and this is causing two problems:
 - extra page faults after zapping mmu pages
 - long mmu_lock hold time during zapping mmu pages

For the latter, Marcelo reported a disastrous mmu_lock hold time during
hot-plug, which made the guest unresponsive for a long time.

This patch takes a simple way to fix these problems: do not zap mmu
pages unless they are marked mmio cached.  On our test box, this took
only 50us for the 4GB guest and we did not see ms of mmu_lock hold time
any more.

Note that we still need to do zap_all() for other cases.  So another
work is also needed: Xiao's work may be the one.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

982b3394

KVM: MMU: Mark sp mmio cached when creating mmio spte · 95b0430d

由 Takuya Yoshikawa 提交于 3月 12, 2013

This will be used not to zap unrelated mmu pages when creating/moving
a memory slot later.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

95b0430d

KVM: nVMX: Add preemption timer support · 0238ea91

由 Jan Kiszka 提交于 3月 13, 2013

Provided the host has this feature, it's straightforward to offer it to
the guest as well. We just need to load to timer value on L2 entry if
the feature was enabled by L1 and watch out for the corresponding exit
reason.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0238ea91

KVM: nVMX: Provide EFER.LMA saving support · c18911a2

由 Jan Kiszka 提交于 3月 13, 2013

We will need EFER.LMA saving to provide unrestricted guest mode. All
what is missing for this is picking up EFER.LMA from VM_ENTRY_CONTROLS
on L2->L1 switches. If the host does not support EFER.LMA saving,
no change is performed, otherwise we properly emulate for L1 what the
hardware does for L0. Advertise the support, depending on the host
feature.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c18911a2

13 3月, 2013 4 次提交

KVM: nVMX: Clean up and fix pin-based execution controls · eabeaacc

由 Jan Kiszka 提交于 3月 13, 2013

Only interrupt and NMI exiting are mandatory for KVM to work, thus can
be exposed to the guest unconditionally, virtual NMI exiting is
optional. So we must not advertise it unless the host supports it.

Introduce the symbolic constant PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR at
this chance.
Reviewed-by: N: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

eabeaacc

KVM: x86: Rework INIT and SIPI handling · 66450a21

由 Jan Kiszka 提交于 3月 13, 2013

A VCPU sending INIT or SIPI to some other VCPU races for setting the
remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
was overwritten by kvm_emulate_halt and, thus, got lost.

This introduces APIC events for those two signals, keeping them in
kvm_apic until kvm_apic_accept_events is run over the target vcpu
context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
are pending events, thus if vcpu blocking should end.

The patch comes with the side effect of effectively obsoleting
KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
state. That also means we no longer exit to user space after receiving a
SIPI event.

Furthermore, we already reset the VCPU on INIT, only fixing up the code
segment later on when SIPI arrives. Moreover, we fix INIT handling for
the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

66450a21

KVM: MMU: make kvm_mmu_available_pages robust against n_used_mmu_pages > n_max_mmu_pages · 5d218814

由 Marcelo Tosatti 提交于 3月 12, 2013

As noticed by Ulrich Obergfell <uobergfe@redhat.com>, the mmu
counters are for beancounting purposes only - so n_used_mmu_pages and
n_max_mmu_pages could be relaxed (example: before f0f5933a),
resulting in n_used_mmu_pages > n_max_mmu_pages.

Make code robust against n_used_mmu_pages > n_max_mmu_pages.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

5d218814

Select VIRT_TO_BUS directly where needed · 4febd95a

由 Stephen Rothwell 提交于 3月 07, 2013

In commit 887cbce0 ("arch Kconfig: centralise ARCH_NO_VIRT_TO_BUS")
I introduced the config sybmol HAVE_VIRT_TO_BUS and selected that where
needed.  I am not sure what I was thinking.  Instead, just directly
select VIRT_TO_BUS where it is needed.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4febd95a

12 3月, 2013 2 次提交

KVM: x86: Drop unused return code from VCPU reset callback · 57f252f2

由 Jan Kiszka 提交于 3月 12, 2013

Neither vmx nor svm nor the common part may generate an error on
kvm_vcpu_reset. So drop the return code.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

57f252f2

VMX: x86: handle host TSC calibration failure · 03ba32ca

由 Marcelo Tosatti 提交于 3月 11, 2013

If the host TSC calibration fails, tsc_khz is zero (see tsc_init.c).
Handle such case properly in KVM (instead of dividing by zero).

https://bugzilla.redhat.com/show_bug.cgi?id=859282Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

03ba32ca

11 3月, 2013 1 次提交

kvm: remove cast for kmalloc return value · 0fa24ce3

由 Ioan Orghici 提交于 3月 10, 2013

Signed-off-by: Ioan Orghici<ioan.orghici@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0fa24ce3

08 3月, 2013 6 次提交

x86: Do not try to sync identity map for non-mapped pages · 60f583d5

由 Dave Hansen 提交于 3月 07, 2013

kernel_map_sync_memtype() is called from a variety of contexts. The
pat.c code that calls it seems to ensure that it is not called for
non-ram areas by checking via pat_pagerange_is_ram(). It is important
that it only be called on the actual identity map because there *IS*
no map to sync for highmem pages, or for memory holes.

The ioremap.c uses are not as careful as those from pat.c, and call
kernel_map_sync_memtype() on PCI space which is in the middle of the
kernel identity map _range_, but is not actually mapped.

This patch adds a check to kernel_map_sync_memtype() which probably
duplicates some of the checks already in pat.c. But, it is necessary
for the ioremap.c uses and shouldn't hurt other callers.

I have reproduced this bug and this patch fixes it for me and the
original bug reporter:

https://lkml.org/lkml/2013/2/5/396Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130307163151.D9B58C4E@kernel.stglabs.ibm.comSigned-off-by: NDave Hansen <dave@sr71.net>
Tested-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

60f583d5

KVM: MMU: Introduce a helper function for FIFO zapping · 5da59607

由 Takuya Yoshikawa 提交于 3月 06, 2013

Make the code for zapping the oldest mmu page, placed at the tail of the
active list, a separate function.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5da59607

KVM: MMU: Use list_for_each_entry_safe in kvm_mmu_commit_zap_page() · 945315b9

由 Takuya Yoshikawa 提交于 3月 06, 2013

We are traversing the linked list, invalid_list, deleting each entry by
kvm_mmu_free_page().  _safe version is there for such a case.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

945315b9

KVM: MMU: Fix and clean up for_each_gfn_* macros · 1044b030

由 Takuya Yoshikawa 提交于 3月 06, 2013

The expression (sp)->gfn should not be expanded using @gfn.
Although no user of these macros passes a string other than gfn now,
this should be fixed before anyone sees strange errors.

Note: ignored the following checkpatch errors:
  ERROR: Macros with complex values should be enclosed in parenthesis
  ERROR: trailing statements should be on next line
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1044b030

KVM: nVMX: Fix setting of CR0 and CR4 in guest mode · 1a0d74e6

由 Jan Kiszka 提交于 3月 07, 2013

The logic for calculating the value with which we call kvm_set_cr0/4 was
broken (will definitely be visible with nested unrestricted guest mode
support). Also, we performed the check regarding CR0_ALWAYSON too early
when in guest mode.

What really needs to be done on both CR0 and CR4 is to mask out L1-owned
bits and merge them in from L1's guest_cr0/4. In contrast, arch.cr0/4
and arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and,
thus, are not suited as input.

For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
refuse the update if it fails. To be fully consistent, we implement this
check now also for CR4. For CR4, we move the check into vmx_set_cr4
while we keep it in handle_set_cr0. This is because the CR0 checks for
vmxon vs. guest mode will diverge soon when adding unrestricted guest
mode support.

Finally, we have to set the shadow to the value L2 wanted to write
originally.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1a0d74e6

KVM: nVMX: Fix content of MSR_IA32_VMX_ENTRY/EXIT_CTLS · 33fb20c3

由 Jan Kiszka 提交于 3月 06, 2013

Properly set those bits to 1 that the spec demands in case bit 55 of
VMX_BASIC is 0 - like in our case.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

33fb20c3

07 3月, 2013 4 次提交

x86, doc: Be explicit about what the x86 struct boot_params requires · 3c4aff6b

由 Peter Jones 提交于 3月 06, 2013

If the sentinel triggers, we do not want the boot loader authors to
just poke it and make the error go away, we want them to actually fix
the problem.

This should help avoid making the incorrect change in non-compliant
bootloaders.

[ hpa: dropped the Documentation/x86/boot.txt hunk pending
  clarifications ]
Signed-off-by: NPeter Jones <pjones@redhat.com>
Link: http://lkml.kernel.org/r/1362592823-28967-1-git-send-email-pjones@redhat.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

3c4aff6b

x86: Don't clear efi_info even if the sentinel hits · 2e604c0f

由 Josh Boyer 提交于 3月 06, 2013

When boot_params->sentinel is set, all we really know is that some
undefined set of fields in struct boot_params contain garbage. In the
particular case of efi_info, however, there is a private magic for
that substructure, so it is generally safe to leave it even if the
bootloader is broken.

kexec (for which we did the initial analysis) did not initialize this
field, but of course all the EFI bootloaders do, and most EFI
bootloaders are broken in this respect (and should be fixed.)
Reported-by: NRobin Holt <holt@sgi.com>
Link: http://lkml.kernel.org/r/CA%2B5PVA51-FT14p4CRYKbicykugVb=PiaEycdQ57CK2km_OQuRQ@mail.gmail.comTested-by: NJosh Boyer <jwboyer@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

2e604c0f

x86, mm: Make sure to find a 2M free block for the first mapped area · 98e7a989

由 Yinghai Lu 提交于 3月 06, 2013

Henrik reported that his MacAir 3.1 would not boot with

| commit 8d57470d
| Date:   Fri Nov 16 19:38:58 2012 -0800
|
|    x86, mm: setup page table in top-down

It turns out that we do not calculate the real_end properly:
We try to get 2M size with 4K alignment, and later will round down
to 2M, so we will get less then 2M for first mapping, in extreme
case could be only 4K only. In Henrik's system it has (1M-32K) as
last usable rage is [mem 0x7f9db000-0x7fef8fff].

The problem is exposed when EFI booting have several holes and it
will force mapping to use PTE instead as we only map usable areas.

To fix it, just make it be 2M aligned, so we can be guaranteed to be
able to use large pages to map it.
Reported-by: NHenrik Rydberg <rydberg@euromail.se>
Bisected-by: NHenrik Rydberg <rydberg@euromail.se>
Tested-by: NHenrik Rydberg <rydberg@euromail.se>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQX4nQ7_1kg5RL_vh56rmcSHXUi1ExrZX7CwED4NGMnHfg@mail.gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

98e7a989

x86: Fix 32-bit *_cpu_data initializers · 015221fe

由 Krzysztof Mazur 提交于 3月 03, 2013

The commit 27be4570
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.

If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.

To avoid such problems in future C99-style initialization is now
used.
Signed-off-by: NKrzysztof Mazur <krzysiek@podlesie.net>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

015221fe

06 3月, 2013 2 次提交

KVM: nVMX: Reset RFLAGS on VM-exit · c4627c72

由 Jan Kiszka 提交于 3月 03, 2013

Ouch, how could this work so well that far? We need to clear RFLAGS to
the reset value as specified by the SDM. Particularly, IF must be off
after VM-exit!
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c4627c72

x86, smpboot: Remove unused variable · 576cfb40

由 Borislav Petkov 提交于 3月 04, 2013

The cpuinfo_x86 ptr is unused now. Drop it. Got obsolete by 69fb3676
("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
removing its only user.

[ hpa: fixes gcc warning ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428180-8865-2-git-send-email-bp@alien8.de
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

576cfb40

05 3月, 2013 5 次提交

KVM: nVMX: Fix switching of debug state · 503cd0c5

由 Jan Kiszka 提交于 3月 03, 2013

First of all, do not blindly overwrite GUEST_DR7 on L2 entry. The host
may have guest debugging enabled. Then properly reset DR7 and DEBUG_CTL
on L2->L1 switch as specified in the SDM.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

503cd0c5

KVM: set_memory_region: Refactor commit_memory_region() · 8482644a

由 Takuya Yoshikawa 提交于 2月 27, 2013

This patch makes the parameter old a const pointer to the old memory
slot and adds a new parameter named change to know the change being
requested: the former is for removing extra copying and the latter is
for cleaning up the code.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8482644a

KVM: set_memory_region: Refactor prepare_memory_region() · 7b6195a9

由 Takuya Yoshikawa 提交于 2月 27, 2013

This patch drops the parameter old, a copy of the old memory slot, and
adds a new parameter named change to know the change being requested.

This not only cleans up the code but also removes extra copying of the
memory slot structure.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7b6195a9

KVM: set_memory_region: Drop user_alloc from set_memory_region() · 47ae31e2

由 Takuya Yoshikawa 提交于 2月 27, 2013

Except ia64's stale code, KVM_SET_MEMORY_REGION support, this is only
used for sanity checks in __kvm_set_memory_region() which can easily
be changed to use slot id instead.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

47ae31e2

KVM: set_memory_region: Drop user_alloc from prepare/commit_memory_region() · 462fce46

由 Takuya Yoshikawa 提交于 2月 27, 2013

X86 does not use this any more.  The remaining user, s390's !user_alloc
check, can be simply removed since KVM_SET_MEMORY_REGION ioctl is no
longer supported.

Note: fixed powerpc's indentations with spaces to suppress checkpatch
errors.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

462fce46

03 3月, 2013 1 次提交

x86, ACPI, mm: Revert movablemem_map support · 20e6926d

由 Yinghai Lu 提交于 3月 01, 2013

Tim found:

  WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
  Hardware name: S2600CP
  sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
  smpboot: Booting Node   1, Processors  #1
  Modules linked in:
  Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
  Call Trace:
    set_cpu_sibling_map+0x279/0x449
    start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
	nodes_clear(numa_nodes_parsed)
	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
   can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
   and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
   a. that early_parse_srat is NOT called for ia64, so you break ia64.
   b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
	     set_apicid_to_node(i, NUMA_NO_NODE)
     still left in numa_init. So it will just clear result from early_parse_srat.
     it should be moved before that....
   c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
       early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
   but it changes critical x86 code. It caused x86 guys did not
   pay attention to find the problem early. Those patches really should
   be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
  a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
  b. initrd... it will be freed after booting, so it could be on movable...
  c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
	anymore.
  d. init_mem_mapping: can not put page table high anymore.
  e. initmem_init: vmemmap can not be high local node anymore. That is
     not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

 f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
    protect movablecore_map in memblock_overlaps_region().")

 01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
    SRAT")

 27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
    the end of node")

 e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
    ready")

 fb06bc8e ("page_alloc: bootmem limit with movablecore_map")

 42f47e27 ("page_alloc: make movablemem_map have higher priority")

 6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
    movable limit for nodes")

 34b71f1e ("page_alloc: add movable_memmap kernel parameter")

 4d59a751 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0.  Also
need to find way to put other kernel used ram to local node ram.
Reported-by: NTim Gardner <tim.gardner@canonical.com>
Reported-by: NDon Morris <don.morris@hp.com>
Bisected-by: NDon Morris <don.morris@hp.com>
Tested-by: NDon Morris <don.morris@hp.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

20e6926d

01 3月, 2013 1 次提交

xen/pci: We don't do multiple MSI's. · 884ac297

由 Konrad Rzeszutek Wilk 提交于 2月 28, 2013

There is no hypercall to setup multiple MSI per PCI device.
As such with these two new commits:
-  08261d87
   PCI/MSI: Enable multiple MSIs with pci_enable_msi_block_auto()
- 5ca72c4f
   AHCI: Support multiple MSIs

we would call the PHYSDEVOP_map_pirq 'nvec' times with the same
contents of the PCI device. Sander discovered that we would get
the same PIRQ value 'nvec' times and return said values to the
caller. That of course meant that the device was configured only
with one MSI and AHCI would fail with:

ahci 0000:00:11.0: version 3.0
xen: registering gsi 19 triggering 0 polarity 1
xen: --> pirq=19 -> irq=19 (gsi=19)
(XEN) [2013-02-27 19:43:07] IOAPIC[0]: Set PCI routing entry (6-19 -> 0x99 -> IRQ 19 Mode:1 Active:1)
ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 6 Gbps 0xf impl SATA mode
ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
ahci: probe of 0000:00:11.0 failed with error -22

That is b/c in ahci_host_activate the second call to
devm_request_threaded_irq  would return -EINVAL as we passed in
(on the second run) an IRQ that was never initialized.

CC: stable@vger.kernel.org
Reported-and-Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

884ac297

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功