提交 · 8915aa27d5efbb9185357175b0acf884325565f9 · openeuler / Kernel

12 6月, 2013 1 次提交

KVM: x86: handle idiv overflow at kvm_write_tsc · 8915aa27

由 Marcelo Tosatti 提交于 6月 11, 2013

Its possible that idivl overflows (due to large delta stored in usdiff,
valid scenario).

Create an exception handler to catch the overflow exception (division by zero
is protected by vcpu->arch.virtual_tsc_khz check), and interpret it accordingly
(delta is larger than USEC_PER_SEC).

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=969644Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8915aa27

05 6月, 2013 11 次提交

KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped · 05988d72

由 Gleb Natapov 提交于 5月 31, 2013

Quote Gleb's mail:
| why don't we check for sp->role.invalid in
| kvm_mmu_prepare_zap_page before calling kvm_reload_remote_mmus()?

and

| Actually we can add check for is_obsolete_sp() there too since
| kvm_mmu_invalidate_all_pages() already calls kvm_reload_remote_mmus()
| after incrementing mmu_valid_gen.

[ Xiao: add some comments and the check of is_obsolete_sp() ]
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

05988d72

KVM: MMU: reclaim the zapped-obsolete page first · 365c8868

由 Xiao Guangrong 提交于 5月 31, 2013

As Marcelo pointed out that
| "(retention of large number of pages while zapping)
| can be fatal, it can lead to OOM and host crash"

We introduce a list, kvm->arch.zapped_obsolete_pages, to link all
the pages which are deleted from the mmu cache but not actually
freed. When page reclaiming is needed, we always zap this kind of
pages first.
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

365c8868

KVM: MMU: collapse TLB flushes when zap all pages · f34d251d

由 Xiao Guangrong 提交于 5月 31, 2013

kvm_zap_obsolete_pages uses lock-break technique to zap pages,
it will flush tlb every time when it does lock-break

We can reload mmu on all vcpus after updating the generation
number so that the obsolete pages are not used on any vcpus,
after that we do not need to flush tlb when obsolete pages
are zapped

It will do kvm_mmu_prepare_zap_page many times and use one
kvm_mmu_commit_zap_page to collapse tlb flush, the side-effects
is that causes obsolete pages unlinked from active_list but leave
on hash-list, so we add the comment around the hash list walker

Note: kvm_mmu_commit_zap_page is still needed before free
the pages since other vcpus may be doing locklessly shadow
page walking
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f34d251d

KVM: MMU: zap pages in batch · e7d11c7a

由 Xiao Guangrong 提交于 5月 31, 2013

Zap at lease 10 pages before releasing mmu-lock to reduce the overload
caused by requiring lock

After the patch, kvm_zap_obsolete_pages can forward progress anyway,
so update the comments

[ It improves the case 0.6% ~ 1% that do kernel building meanwhile read
  PCI ROM. ]

Note: i am not sure that "10" is the best speculative value, i just
guessed that '10' can make vcpu do not spend long time on
kvm_zap_obsolete_pages and do not cause mmu-lock too hungry.
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e7d11c7a

KVM: MMU: do not reuse the obsolete page · 7f52af74

由 Xiao Guangrong 提交于 5月 31, 2013

The obsolete page will be zapped soon, do not reuse it to
reduce future page fault
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

7f52af74

KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages · 35006126

由 Xiao Guangrong 提交于 5月 31, 2013

It is good for debug and development
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

35006126

KVM: MMU: show mmu_valid_gen in shadow page related tracepoints · 2248b023

由 Xiao Guangrong 提交于 5月 31, 2013

Show sp->mmu_valid_gen
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

2248b023

KVM: x86: use the fast way to invalidate all pages · 6ca18b69

由 Xiao Guangrong 提交于 5月 31, 2013

Replace kvm_mmu_zap_all by kvm_mmu_invalidate_zap_all_pages
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

6ca18b69

KVM: MMU: fast invalidate all pages · 5304b8d3

由 Xiao Guangrong 提交于 5月 31, 2013

The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
walk and zap all shadow pages one by one, also it need to zap all guest
page's rmap and all shadow page's parent spte list. Particularly, things
become worse if guest uses more memory or vcpus. It is not good for
scalability

In this patch, we introduce a faster way to invalidate all shadow pages.
KVM maintains a global mmu invalid generation-number which is stored in
kvm->arch.mmu_valid_gen and every shadow page stores the current global
generation-number into sp->mmu_valid_gen when it is created

When KVM need zap all shadow pages sptes, it just simply increase the
global generation-number then reload root shadow pages on all vcpus.
Vcpu will create a new shadow page table according to current kvm's
generation-number. It ensures the old pages are not used any more.
Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen)
are zapped by using lock-break technique
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

5304b8d3

KVM: MMU: drop unnecessary kvm_reload_remote_mmus · a2ae1622

由 Xiao Guangrong 提交于 5月 31, 2013

It is the responsibility of kvm_mmu_zap_all that keeps the
consistent of mmu and tlbs. And it is also unnecessary after
zap all mmio sptes since no mmio spte exists on root shadow
page and it can not be cached into tlb
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

a2ae1622

KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall · 758ccc89

由 Xiao Guangrong 提交于 5月 31, 2013

Quote Gleb's mail:

| Back then kvm->lock protected memslot access so code like:
|
| mutex_lock(&vcpu->kvm->lock);
| kvm_mmu_zap_all(vcpu->kvm);
| mutex_unlock(&vcpu->kvm->lock);
|
| which is what 7aa81cc0 does was enough to guaranty that no vcpu will
| run while code is patched. This is no longer the case and
| mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago,
| so now kvm_mmu_zap_all() there is useless and the code is incorrect.

So we drop it and it will be fixed later
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

758ccc89

21 5月, 2013 8 次提交

KVM: x86 emulator: convert XADD to fastop · e47a5f5f

由 Avi Kivity 提交于 2月 09, 2013

Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e47a5f5f

A
KVM: x86 emulator: drop unused old-style inline emulation · 203831e8
由 Avi Kivity 提交于 2月 09, 2013
```
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
```
203831e8

KVM: x86 emulator: convert DIV/IDIV to fastop · b8c0b6ae

由 Avi Kivity 提交于 2月 09, 2013

Since DIV and IDIV can generate exceptions, we need an additional output
parameter indicating whether an execption has occured.  To avoid increasing
register pressure on i386, we use %rsi, which is already allocated for
the fastop code pointer.

Gleb: added comment about fop usage as exception indication.
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

b8c0b6ae

A
KVM: x86 emulator: convert single-operand MUL/IMUL to fastop · b9fa409b
由 Avi Kivity 提交于 2月 09, 2013
```
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
```
b9fa409b

KVM: x86 emulator: Switch fastop src operand to RDX · 017da7b6

由 Avi Kivity 提交于 2月 09, 2013

This makes OpAccHi useful.
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

017da7b6

KVM: x86 emulator: switch MUL/DIV to DstXacc · ab2c5ce6

由 Avi Kivity 提交于 2月 09, 2013

Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

ab2c5ce6

KVM: x86 emulator: decode extended accumulator explicity · 820207c8

由 Avi Kivity 提交于 2月 09, 2013

Single-operand MUL and DIV access an extended accumulator: AX for byte
instructions, and DX:AX, EDX:EAX, or RDX:RAX for larger-sized instructions.
Add support for fetching the extended accumulator.

In order not to change things too much, RDX is loaded into Src2, which is
already loaded by fastop().  This avoids increasing register pressure on
i386.

Gleb: disable src writeback for ByteOp div/mul.
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

820207c8

KVM: x86 emulator: add support for writing back the source operand · fb32b1ed

由 Avi Kivity 提交于 2月 09, 2013

Some instructions write back the source operand, not just the destination.
Add support for doing this via the decode flags.

Gleb: add BUG_ON() to prevent source to be memory operand.
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

fb32b1ed

19 5月, 2013 1 次提交

KVM: get rid of $(addprefix ../../../virt/kvm/, ...) in Makefiles · 535cf7b3

由 Marc Zyngier 提交于 5月 14, 2013

As requested by the KVM maintainers, remove the addprefix used to
refer to the main KVM code from the arch code, and replace it with
a KVM variable that does the same thing.
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Christoffer Dall <cdall@cs.columbia.edu>
Acked-by: NXiantao Zhang <xiantao.zhang@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

535cf7b3

16 5月, 2013 2 次提交

KVM: MMU: clenaup locking in mmu_free_roots() · 35af577a

由 Gleb Natapov 提交于 5月 16, 2013

Do locking around each case separately instead of having one lock and two
unlocks. Move root_hpa assignment out of the lock.
Signed-off-by: NGleb Natapov <gleb@redhat.com>

35af577a

KVM: x86: limit difference between kvmclock updates · 0061d53d

由 Marcelo Tosatti 提交于 5月 09, 2013

kvmclock updates which are isolated to a given vcpu, such as vcpu->cpu
migration, should not allow system_timestamp from the rest of the vcpus
to remain static. Otherwise ntp frequency correction applies to one
vcpu's system_timestamp but not the others.

So in those cases, request a kvmclock update for all vcpus. The worst
case for a remote vcpu to update its kvmclock is then bounded by maximum
nohz sleep latency.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0061d53d

14 5月, 2013 1 次提交

KVM: x86: Remove support for reporting coalesced APIC IRQs · f1ed0450

由 Jan Kiszka 提交于 4月 28, 2013

Since the arrival of posted interrupt support we can no longer guarantee
that coalesced IRQs are always reported to the IRQ source. Moreover,
accumulated APIC timer events could cause a busy loop when a VCPU should
rather be halted. The consensus is to remove coalesced tracking from the
LAPIC.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f1ed0450

12 5月, 2013 1 次提交

KVM: MMU: Use kvm_mmu_sync_roots() in kvm_mmu_load() · e2858b4a

由 Takuya Yoshikawa 提交于 5月 09, 2013

No need to open-code this function.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e2858b4a

10 5月, 2013 3 次提交

xen/pci: Used cached MSI-X capability offset · 7c86617d

由 Bjorn Helgaas 提交于 4月 22, 2013

We now cache the MSI-X capability offset in the struct pci_dev, so no
need to find the capability again.
Acked-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7c86617d

xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK · 4be6bfe2

由 Bjorn Helgaas 提交于 4月 22, 2013

PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the
Table Offset register, not the flags ("Message Control" per spec)
register.
Acked-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4be6bfe2

A
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE · 91c2e0bc
由 Al Viro 提交于 3月 05, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
91c2e0bc

09 5月, 2013 4 次提交

KVM: emulator: emulate SALC · 326f578f

由 Paolo Bonzini 提交于 5月 09, 2013

This is an almost-undocumented instruction available in 32-bit mode.
I say "almost" undocumented because AMD documents it in their opcode
maps just to say that it is unavailable in 64-bit mode (sections
"A.2.1 One-Byte Opcodes" and "B.3 Invalid and Reassigned Instructions
in 64-Bit Mode").

It is roughly equivalent to "sbb %al, %al" except it does not
set the flags.  Use fastop to emulate it, but do not use the opcode
directly because it would fail if the host is 64-bit!
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

326f578f

KVM: emulator: emulate XLAT · 7fa57952

由 Paolo Bonzini 提交于 5月 09, 2013

This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1.
It is just a MOV in disguise, with a funny source address.
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

7fa57952

KVM: emulator: emulate AAM · a035d5c6

由 Paolo Bonzini 提交于 5月 09, 2013

This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1.

AAM needs the source operand to be unsigned; do the same in AAD as well
for consistency, even though it does not affect the result.
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # 3.9
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

a035d5c6

KVM: VMX: fix halt emulation while emulating invalid guest sate · 8d76c49e

由 Gleb Natapov 提交于 5月 08, 2013

The invalid guest state emulation loop does not check halt_request
which causes 100% cpu loop while guest is in halt and in invalid
state, but more serious issue is that this leaves halt_request set, so
random instruction emulated by vm86 #GP exit can be interpreted
as halt which causes guest hang. Fix both problems by handling
halt_request in emulation loop.
Reported-by: NTomas Papan <tomas.papan@gmail.com>
Tested-by: NTomas Papan <tomas.papan@gmail.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8d76c49e

08 5月, 2013 4 次提交

xen: mask x2APIC feature in PV · 4ea9b9ac

由 Zhenzhong Duan 提交于 3月 29, 2013

On x2apic enabled pvm, doing sysrq+l, got NULL pointer dereference as below.

    SysRq : Show backtrace of all active CPUs
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120
    Call Trace:
     [<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160
     [<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20
     [<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0
     [<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50
     [<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10
     [<ffffffff8131616d>] __handle_sysrq+0x7d/0x140
     [<ffffffff81316230>] ? __handle_sysrq+0x140/0x140
     [<ffffffff81316287>] write_sysrq_trigger+0x57/0x60
     [<ffffffff811ca996>] proc_reg_write+0x86/0xc0
     [<ffffffff8116dd8e>] vfs_write+0xce/0x190
     [<ffffffff8116e3e5>] sys_write+0x55/0x90
     [<ffffffff8150a242>] system_call_fastpath+0x16/0x1b

That's because apic points to apic_x2apic_cluster or apic_x2apic_phys
but the basic element like cpumask isn't initialized.

Mask x2APIC feature in pvm to avoid overwrite of apic pointer,
update commit message per Konrad's suggestion.
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Tested-by: NTamon Shiose <tamon.shiose@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4ea9b9ac

xen/spinlock: Fix check from greater than to be also be greater or equal to. · cb91f8f4

由 Konrad Rzeszutek Wilk 提交于 5月 06, 2013

During review of git commit cb9c6f15
("xen/spinlock:  Check against default value of -1 for IRQ line.")
Stefano pointed out a bug in the patch. Unfortunatly due to vacation
timing the fix was not applied and this patch fixes it up.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cb91f8f4

xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info · d5b17dbf

由 Konrad Rzeszutek Wilk 提交于 5月 05, 2013

As it will point to some data, but not event channel data (the
shared_info has an array limited to 32).

This means that for PVHVM guests with more than 32 VCPUs without
the usage of VCPUOP_register_info any interrupts to VCPUs
larger than 32 would have gone unnoticed during early bootup.

That is OK, as during early bootup, in smp_init we end up calling
the hotplug mechanism (xen_hvm_cpu_notify) which makes the
VCPUOP_register_vcpu_info call for all VCPUs and we can receive
interrupts on VCPUs 33 and further.

This is just a cleanup.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d5b17dbf

KVM: x86: fix maintenance of guest/host xcr0 state · 42bdf991

由 Marcelo Tosatti 提交于 4月 15, 2013

Emulation of xcr0 writes zero guest_xcr0_loaded variable so that
subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value.

However, this is incorrect because guest_xcr0_loaded variable is
read to decide whether to reload hosts xcr0.

In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0
assignment, and scheduler decides to preload FPU:

switch_to
{
  __switch_to
    __math_state_restore
      restore_fpu_checking
        fpu_restore_checking
          if (use_xsave())
              fpu_xrstor_checking
		xrstor64 with CPU's xcr0 == guests xcr0

Fix by properly restoring hosts xcr0 during emulation of xcr0 writes.
Analyzed-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

42bdf991

07 5月, 2013 2 次提交

x86 rwsem: avoid taking slow path when stealing write lock · a31a369b

由 Michel Lespinasse 提交于 5月 07, 2013

modify __down_write[_nested] and __down_write_trylock to grab the write
lock whenever the active count is 0, even if there are queued waiters
(they must be writers pending wakeup, since the active count is 0).

Note that this is an optimization only; architectures without this
optimization will still work fine:

- __down_write() would take the slow path which would take the wait_lock
  and then try stealing the lock (as in the spinlocked rwsem implementation)

- __down_write_trylock() would fail, but callers must be ready to deal
  with that - since there are some writers pending wakeup, they could
  have raced with us and obtained the lock before we steal it.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a31a369b

xen/vcpu: Document the xen_vcpu_info and xen_vcpu · a520996a

由 Konrad Rzeszutek Wilk 提交于 5月 05, 2013

They are important structures and it is not clear at first
look what they are for.

The xen_vcpu is a pointer. By default it points to the shared_info
structure (at the CPU offset location). However if the
VCPUOP_register_vcpu_info hypercall is implemented we can make the
xen_vcpu pointer point to a per-CPU location.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Added comments from Ian Campbell]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a520996a

06 5月, 2013 1 次提交

xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. · 7f1fc268

由 Konrad Rzeszutek Wilk 提交于 5月 05, 2013

If a user did:

	echo 0 > /sys/devices/system/cpu/cpu1/online
	echo 1 > /sys/devices/system/cpu/cpu1/online

we would (this a build with DEBUG enabled) get to:
smpboot: ++++++++++++++++++++=_---CPU UP  1
.. snip..
smpboot: Stack at about ffff880074c0ff44
smpboot: CPU1: has booted.

and hang. The RCU mechanism would kick in an try to IPI the CPU1
but the IPIs (and all other interrupts) would never arrive at the
CPU1. At first glance at least. A bit digging in the hypervisor
trace shows that (using xenanalyze):

[vla] d4v1 vec 243 injecting
   0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
]  0.043163639 --|x d4v1 vmentry cycles 1468
]  0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
   0.043164913 --|x d4v1 inj_virq vec 243  real
  [vla] d4v1 vec 243 injecting
   0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
]  0.043165526 --|x d4v1 vmentry cycles 1472
]  0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
   0.043166800 --|x d4v1 inj_virq vec 243  real
  [vla] d4v1 vec 243 injecting

there is a pending event (subsequent debugging shows it is the IPI
from the VCPU0 when smpboot.c on VCPU1 has done
"set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
interrupted with the callback IPI (0xf3 aka 243) which ends up calling
__xen_evtchn_do_upcall.

The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
the pending events. And the moment the guest does a 'cli' (that is the
ffffffff81673254 in the log above) the hypervisor is invoked again to
inject the IPI (0xf3) to tell the guest it has pending interrupts.
This repeats itself forever.

The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
we set each per_cpu(xen_vcpu, cpu) to point to the
shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
to register per-CPU  structures (xen_vcpu_setup).
This is used to allow events for more than 32 VCPUs and for performance
optimizations reasons.

When the user performs the VCPU hotplug we end up calling the
the xen_vcpu_setup once more. We make the hypercall which returns
-EINVAL as it does not allow multiple registration calls (and
already has re-assigned where the events are being set). We pick
the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
However the hypervisor is still setting events in the register
per-cpu structure (per_cpu(xen_vcpu_info, cpu)).

As such when the events are set by the hypervisor (such as timer one),
and when we iterate in __xen_evtchn_do_upcall we end up reading stale
events from the shared_info->vcpu_info[vcpu] instead of the
per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
events that the hypervisor has set and the hypervisor keeps on reminding
us to ack the events which we never do.

The fix is simple. Don't on the second time when xen_vcpu_setup is
called over-write the per_cpu(xen_vcpu, cpu) if it points to
per_cpu(xen_vcpu_info).
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: stable@vger.kernel.org
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7f1fc268

05 5月, 2013 1 次提交

perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL · 7cc23cd6

由 Peter Zijlstra 提交于 5月 03, 2013

We should always have proper privileges when requesting kernel
data.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl
[ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org

7cc23cd6

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功