提交 · 64be5007066173d11a4635eedd57d41a3b3a7027 · openanolis / cloud-kernel

12 1月, 2011 25 次提交

KVM: x86: trace "exit to userspace" event · 64be5007

由 Gleb Natapov 提交于 10月 24, 2010

Add tracepoint for userspace exit.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

64be5007

KVM: propagate fault r/w information to gup(), allow read-only memory · 612819c3

由 Marcelo Tosatti 提交于 10月 22, 2010

As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

612819c3

KVM: MMU: flush TLBs on writable -> read-only spte overwrite · 7905d9a5

由 Marcelo Tosatti 提交于 10月 22, 2010

This can happen in the following scenario:

vcpu0			vcpu1
read fault
gup(.write=0)
			gup(.write=1)
			reuse swap cache, no COW
			set writable spte
			use writable spte
set read-only spte
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7905d9a5

KVM: MMU: remove kvm_mmu_set_base_ptes · 982c2565

由 Marcelo Tosatti 提交于 10月 22, 2010

Unused.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

982c2565

KVM: VMX: remove setting of shadow_base_ptes for EPT · ff1fcb9e

由 Marcelo Tosatti 提交于 10月 22, 2010

The EPT present/writable bits use the same position as normal
pagetable bits.

Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.

Also pass present/writable error code information from EPT violation
to generic pagefault handler.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ff1fcb9e

KVM: Avoid double interrupt injection with vapic · 83bcacb1

由 Avi Kivity 提交于 10月 25, 2010

After an interrupt injection, the PPR changes, and we have to reflect that
into the vapic. This causes a KVM_REQ_EVENT to be set, which causes the
whole interrupt injection routine to be run again (harmlessly).

Optimize by only setting KVM_REQ_EVENT if the ppr was lowered; otherwise
there is no chance that a new injection is needed.
Signed-off-by: NAvi Kivity <avi@redhat.com>

83bcacb1

KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers · 82ca2d10

由 Avi Kivity 提交于 10月 21, 2010

This abstraction only serves to obfuscate.  Remove.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

82ca2d10

KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path · dacccfdd

由 Avi Kivity 提交于 10月 21, 2010

ldt is never used in the kernel context; same goes for fs (x86_64) and gs
(i386).  So save/restore them in the heavyweight exit path instead
of the lightweight path.

By itself, this doesn't buy us much, but it paves the way for moving vmload
and vmsave to the heavyweight exit path, since they modify the same registers.

[jan: fix copy/pase mistake on i386]
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dacccfdd

KVM: SVM: Move svm->host_gs_base into a separate structure · afe9e66f

由 Avi Kivity 提交于 10月 21, 2010

More members will join it soon.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

afe9e66f

KVM: SVM: Move guest register save out of interrupts disabled section · 13c34e07

由 Avi Kivity 提交于 10月 21, 2010

Saving guest registers is just a memory copy, and does not need to be in the
critical section.  Move outside the critical section to improve latency a
bit.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

13c34e07

KVM: x86: Add missing inline tag to kvm_read_and_reset_pf_reason · d4c90b00

由 Jan Kiszka 提交于 10月 20, 2010

May otherwise generates build warnings about unused
kvm_read_and_reset_pf_reason if included without CONFIG_KVM_GUEST
enabled.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d4c90b00

KVM: Move KVM context switch into own function · f56f5369

由 Andi Kleen 提交于 10月 20, 2010

gcc 4.5 with some special options is able to duplicate the VMX
context switch asm in vmx_vcpu_run(). This results in a compile error
because the inline asm sequence uses an on local label. The non local
label is needed because other code wants to set up the return address.

This patch moves the asm code into an own function and marks
that explicitely noinline to avoid this problem.

Better would be probably to just move it into an .S file.

The diff looks worse than the change really is, it's all just
code movement and no logic change.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f56f5369

KVM: x86: Mark kvm_arch_setup_async_pf static · 7e1fbeac

由 Jan Kiszka 提交于 10月 20, 2010

It has no user outside mmu.c and also no prototype.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7e1fbeac

KVM: improve hva_to_pfn() readability · 8030089f

由 Gleb Natapov 提交于 10月 19, 2010

Improve vma handling code readability in hva_to_pfn() and fix
async pf handling code to properly check vma returned by find_vma().
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8030089f

KVM: Send async PF when guest is not in userspace too. · fc5f06fa

由 Gleb Natapov 提交于 10月 14, 2010

If guest indicates that it can handle async pf in kernel mode too send
it, but only if interrupts are enabled.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

fc5f06fa

KVM: Let host know whether the guest can handle async PF in non-userspace context. · 6adba527

由 Gleb Natapov 提交于 10月 14, 2010

If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6adba527

KVM paravirt: Handle async PF in non preemptable context · 6c047cd9

由 Gleb Natapov 提交于 10月 14, 2010

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6c047cd9

KVM: Inject asynchronous page fault into a PV guest if page is swapped out. · 7c90705b

由 Gleb Natapov 提交于 10月 14, 2010

Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.

Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7c90705b

KVM: Handle async PF in a guest. · 631bc487

由 Gleb Natapov 提交于 10月 14, 2010

When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

631bc487

KVM paravirt: Add async PF initialization to PV guest. · fd10cde9

由 Gleb Natapov 提交于 10月 14, 2010

Enable async PF in a guest if async PF capability is discovered.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

fd10cde9

KVM: Add PV MSR to enable asynchronous page faults delivery. · 344d9588

由 Gleb Natapov 提交于 10月 14, 2010

Guest enables async PF vcpu functionality using this MSR.
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

344d9588

KVM paravirt: Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c. · ca3f1017

由 Gleb Natapov 提交于 10月 14, 2010

Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ca3f1017

KVM: Add memory slot versioning and use it to provide fast guest write interface · 49c7754c

由 Gleb Natapov 提交于 10月 18, 2010

Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

49c7754c

KVM: Retry fault before vmentry · 56028d08

由 Gleb Natapov 提交于 10月 17, 2010

When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page. Do it only when tdp is enabled for now. Shadow paging case is
more complicated. CR[034] and EFER registers should be switched before
doing mapping and then switched back.
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

56028d08

KVM: Halt vcpu if page it tries to access is swapped out · af585b92

由 Gleb Natapov 提交于 10月 14, 2010

If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

[avi: remove call to get_user_pages_noio(), nacked by Linus; this
      makes everything synchrnous again]
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

af585b92

02 1月, 2011 2 次提交

KVM: Don't reset mmu context unnecessarily when updating EFER · 010c520e

由 Avi Kivity 提交于 10月 11, 2010

The only bit of EFER that affects the mmu is NX, and this is already
accounted for (LME only takes effect when changing cr0).

Based on a patch by Hillf Danton.
Signed-off-by: NAvi Kivity <avi@redhat.com>

010c520e

KVM: i8259: initialize isr_ack · d0dfc6b7

由 Avi Kivity 提交于 12月 31, 2010

isr_ack is never initialized.  So, until the first PIC reset, interrupts
may fail to be injected.  This can cause Windows XP to fail to boot, as
reported in the fallout from the fix to
https://bugzilla.kernel.org/show_bug.cgi?id=21962.
Reported-and-tested-by: NNicolas Prochazka <prochazka.nicolas@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d0dfc6b7

29 12月, 2010 1 次提交

KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow · 649497d1

由 Avi Kivity 提交于 12月 28, 2010

We use the physical address instead of the base gfn for the four
PAE page directories we use in unpaged mode.  When the guest accesses
an address above 1GB that is backed by a large host page, a BUG_ON()
in kvm_mmu_set_gfn() triggers.

Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=21962Reported-and-tested-by: NNicolas Prochazka <prochazka.nicolas@gmail.com>
KVM-Stable-Tag.
Signed-off-by: NAvi Kivity <avi@redhat.com>

649497d1

19 12月, 2010 3 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · 0a592281

由 Linus Torvalds 提交于 12月 18, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: handle rt_sigreturn() more cleanly
  arch/tile: handle CLONE_SETTLS in copy_thread(), not user space

0a592281

L
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus · 2ba16c4f
由 Linus Torvalds 提交于 12月 18, 2010
```
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Fix build errors in sc-mips.c
```
2ba16c4f

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 · 46bdfe6a

由 Linus Torvalds 提交于 12月 18, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86: avoid high BIOS area when allocating address space
  x86: avoid E820 regions when allocating address space
  x86: avoid low BIOS area when allocating address space
  resources: add arch hook for preventing allocation in reserved areas
  Revert "resources: support allocating space within a region from the top down"
  Revert "PCI: allocate bus resources from the top down"
  Revert "x86/PCI: allocate space from the end of a region, not the beginning"
  Revert "x86: allocate space within a region top-down"
  Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
  PCI: Update MCP55 quirk to not affect non HyperTransport variants

46bdfe6a

18 12月, 2010 9 次提交

arch/tile: handle rt_sigreturn() more cleanly · 81711cee

由 Chris Metcalf 提交于 12月 14, 2010

The current tile rt_sigreturn() syscall pattern uses the common idiom
of loading up pt_regs with all the saved registers from the time of
the signal, then anticipating the fact that we will clobber the ABI
"return value" register (r0) as we return from the syscall by setting
the rt_sigreturn return value to whatever random value was in the pt_regs
for r0.

However, this breaks in our 64-bit kernel when running "compat" tasks,
since we always sign-extend the "return value" register to properly
handle returned pointers that are in the upper 2GB of the 32-bit compat
address space. Doing this to the sigreturn path then causes occasional
random corruption of the 64-bit r0 register.

Instead, we stop doing the crazy "load the return-value register"
hack in sigreturn. We already have some sigreturn-specific assembly
code that we use to pass the pt_regs pointer to C code. We extend that
code to also set the link register to point to a spot a few instructions
after the usual syscall return address so we don't clobber the saved r0.
Now it no longer matters what the rt_sigreturn syscall returns, and the
pt_regs structure can be cleanly and completely reloaded.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>

81711cee

arch/tile: handle CLONE_SETTLS in copy_thread(), not user space · bc4cf2bb

由 Chris Metcalf 提交于 12月 14, 2010

Previously we were just setting up the "tp" register in the
new task as started by clone() in libc.  However, this is not
quite right, since in principle a signal might be delivered to
the new task before it had its TLS set up.  (Of course, this race
window still exists for resetting the libc getpid() cached value
in the new task, in principle.  But in any case, we are now doing
this exactly the way all other architectures do it.)

This change is important for 2.6.37 since the tile glibc we will
be submitting upstream will not set TLS in user space any more,
so it will only work on a kernel that has this fix.  It should
also be taken for 2.6.36.x in the stable tree if possible.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Cc: stable <stable@kernel.org>

bc4cf2bb

MIPS: Fix build errors in sc-mips.c · 081d835f

由 Kevin Cernekee 提交于 11月 02, 2010

Seen with malta_defconfig on Linus' tree:

  CC      arch/mips/mm/sc-mips.o
arch/mips/mm/sc-mips.c: In function 'mips_sc_is_activated':
arch/mips/mm/sc-mips.c:77: error: 'config2' undeclared (first use in this function)
arch/mips/mm/sc-mips.c:77: error: (Each undeclared identifier is reported only once
arch/mips/mm/sc-mips.c:77: error: for each function it appears in.)
arch/mips/mm/sc-mips.c:81: error: 'tmp' undeclared (first use in this function)
make[2]: *** [arch/mips/mm/sc-mips.o] Error 1
make[1]: *** [arch/mips/mm] Error 2
make: *** [arch/mips] Error 2

[Ralf: Cosmetic changes to minimize the number of arguments passed to
mips_sc_is_activated]
Signed-off-by: NKevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/1752/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

081d835f

x86: avoid high BIOS area when allocating address space · a2c606d5

由 Bjorn Helgaas 提交于 12月 16, 2010

This prevents allocation of the last 2MB before 4GB.

The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27

This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

a2c606d5

x86: avoid E820 regions when allocating address space · 4dc2287c

由 Bjorn Helgaas 提交于 12月 16, 2010

When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.

On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS. On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,

BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]

If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory. This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.

I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below). That means we're vulnerable to BIOS defects that Windows would not
trip over. For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

4dc2287c

x86: avoid low BIOS area when allocating address space · 30919b0b

由 Bjorn Helgaas 提交于 12月 16, 2010

This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas.  This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.

We previously avoided this area in pcibios_align_resource().  This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

30919b0b

resources: add arch hook for preventing allocation in reserved areas · fcb11918

由 Bjorn Helgaas 提交于 12月 16, 2010

This adds arch_remove_reservations(), which an arch can implement if it
needs to protect part of the address space from allocation.

Sometimes that can be done by just putting a region in the resource tree,
but there are cases where that doesn't work well.  For example, x86 BIOS
E820 reservations are not related to devices, so they may overlap part of,
all of, or more than a device resource, so they may not end up at the
correct spot in the resource tree.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

fcb11918

Revert "resources: support allocating space within a region from the top down" · c0f5ac54

由 Bjorn Helgaas 提交于 12月 16, 2010

This reverts commit e7f8567d.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

c0f5ac54

Revert "PCI: allocate bus resources from the top down" · 6db45b76

由 Bjorn Helgaas 提交于 12月 16, 2010

This reverts commit b126b470.

We're going back to the old behavior of allocating from bus resources
in _CRS order.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

6db45b76

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功