提交 · ab92f30875a7ec3e84644a5494febd8901e66742 · openanolis / cloud-kernel

08 3月, 2016 9 次提交

KVM: MMU: micro-optimize gpte_access · bb9eadf0

由 Paolo Bonzini 提交于 2月 23, 2016

Avoid AND-NOT, most x86 processor lack an instruction for it.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb9eadf0

KVM: MMU: simplify last_pte_bitmap · 6bb69c9b

由 Paolo Bonzini 提交于 2月 23, 2016

Branch-free code is fun and everybody knows how much Avi loves it,
but last_pte_bitmap takes it a bit to the extreme.  Since the code
is simply doing a range check, like

	(level == 1 ||
	 ((gpte & PT_PAGE_SIZE_MASK) && level < N)

we can make it branch-free without storing the entire truth table;
it is enough to cache N.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6bb69c9b

KVM: MMU: coalesce more page zapping in mmu_sync_children · 50c9e6f3

由 Paolo Bonzini 提交于 2月 25, 2016

mmu_sync_children can only process up to 16 pages at a time.  Check
if we need to reschedule, and do not bother zapping the pages until
that happens.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

50c9e6f3

KVM: MMU: move zap/flush to kvm_mmu_get_page · 2a74003a

由 Paolo Bonzini 提交于 2月 24, 2016

kvm_mmu_get_page is the only caller of kvm_sync_page_transient
and kvm_sync_pages.  Moving the handling of the invalid_list there
removes the need for the underdocumented kvm_sync_page_transient
function.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a74003a

KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* · 1f50f1b3

由 Paolo Bonzini 提交于 2月 24, 2016

Return true if the page was synced (and the TLB must be flushed)
and false if the page was zapped.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f50f1b3

KVM: MMU: cleanup __kvm_sync_page and its callers · 9a43c5d9

由 Paolo Bonzini 提交于 2月 24, 2016

Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes
things unnecessarily tricky.  If kvm_mmu_prepare_zap_page is called,
it will call kvm_unlink_unsync_page too.  So kvm_unlink_unsync_page can
be called just as well at the beginning or the end of __kvm_sync_page...
which means that we might do it in kvm_sync_page too and remove the
parameter.

kvm_sync_page ends up being the same code that kvm_sync_pages used
to have before the previous patch.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a43c5d9

KVM: MMU: use kvm_sync_page in kvm_sync_pages · df748f86

由 Paolo Bonzini 提交于 2月 24, 2016

If the last argument is true, kvm_unlink_unsync_page is called anyway in
__kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page
itself).  Therefore, kvm_sync_pages can just call kvm_sync_page, instead
of going through kvm_unlink_unsync_page+__kvm_sync_page.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df748f86

KVM: MMU: move TLB flush out of __kvm_sync_page · 35a70510

由 Paolo Bonzini 提交于 2月 24, 2016

By doing this, kvm_sync_pages can use __kvm_sync_page instead of
reinventing it.  Because of kvm_mmu_flush_or_zap, the code does not
end up being more complex than before, and more cleanups to kvm_sync_pages
will come in the next patches.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35a70510

KVM: MMU: introduce kvm_mmu_flush_or_zap · b8c67b7a

由 Paolo Bonzini 提交于 2月 24, 2016

This is a generalization of mmu_pte_write_flush_tlb, that also
takes care of calling kvm_mmu_commit_zap_page.  The next
patches will introduce more uses.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8c67b7a

05 3月, 2016 1 次提交

KVM: i8254: drop local copy of mul_u64_u32_div · 0e4d4415

由 Paolo Bonzini 提交于 3月 04, 2016

A function that does the same as i8254.c's muldiv64 has been added
(for KVM's own use, in fact!) in include/linux/math64.h.  Use it
instead of muldiv64.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0e4d4415

04 3月, 2016 19 次提交

KVM: MMU: check kvm_mmu_pages and mmu_page_path indices · e23d3fef

由 Xiao Guangrong 提交于 2月 24, 2016

Give a special invalid index to the root of the walk, so that we
can check the consistency of kvm_mmu_pages and mmu_page_path.
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
[Extracted from a bigger patch proposed by Guangrong. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e23d3fef

KVM: MMU: Fix ubsan warnings · 0a47cd85

由 Paolo Bonzini 提交于 2月 23, 2016

kvm_mmu_pages_init is doing some really yucky stuff. It is setting
up a sentinel for mmu_page_clear_parents; however, because of a) the
way levels are numbered starting from 1 and b) the way mmu_page_path
sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
out of bounds. This is harmless because the code overwrites up to the
first two elements of parents->idx and these are initialized, and
because the sentinel is not needed in this case---mmu_page_clear_parents
exits anyway when it gets to the end of the array. However ubsan
complains, and everyone else should too.

This fix does three things. First it makes the mmu_page_path arrays
PT64_ROOT_LEVEL elements in size, so that we can write to them without
checking the level in advance. Second it disintegrates kvm_mmu_pages_init
between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
for_each_sp (to place the NULL sentinel at the end of the current path).
This is okay because the mmu_page_path is only used in
mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
a for_each_sp iterator, and hence always after a call to mmu_pages_next.
Third it changes mmu_pages_clear_parents to just use the sentinel to
stop iteration, without checking the bounds on level.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a47cd85

KVM: MMU: cleanup handle_abnormal_pfn · 798e88b3

由 Paolo Bonzini 提交于 2月 23, 2016

The goto and temporary variable are unnecessary, just use return
statements.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

798e88b3

KVM: VMX: use vmcs_clear/set_bits for debug register exits · 8f22372f

由 Paolo Bonzini 提交于 2月 26, 2016

Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f22372f

KVM: i8254: turn kvm_kpit_state.reinject into atomic_t · a0aace5a

由 Radim Krčmář 提交于 3月 02, 2016

Document possible races between readers and concurrent update to the
ioctl.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a0aace5a

KVM: i8254: move PIT timer function initialization · ab4c1476

由 Radim Krčmář 提交于 3月 02, 2016

We can do it just once.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ab4c1476

KVM: i8254: don't assume layout of kvm_kpit_state · 34f3941c

由 Radim Krčmář 提交于 3月 02, 2016

channels has offset 0 and correct size now, but that can change.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

34f3941c

KVM: i8254: remove pointless dereference of PIT · 4a2095df

由 Radim Krčmář 提交于 3月 02, 2016

PIT is known at that point.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a2095df

KVM: i8254: remove pit and kvm from kvm_kpit_state · a3e13115

由 Radim Krčmář 提交于 3月 02, 2016

kvm isn't ever used and pit can be accessed with container_of.
If you *really* need kvm, pit_state_to_pit(ps)->kvm.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a3e13115

KVM: i8254: refactor kvm_free_pit · 08e5ccf3

由 Radim Krčmář 提交于 3月 02, 2016

Could be easier to read, but git history will become deeper.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

08e5ccf3

KVM: i8254: refactor kvm_create_pit · 10d24821

由 Radim Krčmář 提交于 3月 02, 2016

Locks are gone, so we don't need to duplicate error paths.
Use goto everywhere.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

10d24821

KVM: i8254: remove notifiers from PIT discard policy · 71474e2f

由 Radim Krčmář 提交于 3月 02, 2016

Discard policy doesn't rely on information from notifiers, so we don't
need to register notifiers unconditionally.  We kept correct counts in
case userspace switched between policies during runtime, but that can be
avoided by reseting the state.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71474e2f

KVM: i8254: remove unnecessary uses of PIT state lock · b39c90b6

由 Radim Krčmář 提交于 3月 02, 2016

- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very
  early, but initialization doesn't use kvm->arch.vpit since the last
  patch, so we can drop locking.
- kvm_free_pit is only run after there are no users of KVM and therefore
  is the sole actor.
- Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject
  is only protected at that place.
- kvm_pit_reset isn't used anywhere and its locking can be dropped if we
  hide it.

Removing useless locking allows to see what actually is being protected
by PIT state lock (values accessible from the guest).
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b39c90b6

KVM: i8254: pass struct kvm_pit instead of kvm in PIT · 09edea72

由 Radim Krčmář 提交于 3月 02, 2016

This patch passes struct kvm_pit into internal PIT functions.
Those functions used to get PIT through kvm->arch.vpit, even though most
of them never used *kvm for other purposes.  Another benefit is that we
don't need to set kvm->arch.vpit during initialization.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09edea72

KVM: i8254: tone down WARN_ON pit.state_lock · b69d920f

由 Radim Krčmář 提交于 3月 02, 2016

If the guest could hit this, it would hang the host kernel, bacause of
sheer number of those reports.  Internal callers have to be sensible
anyway, so we now only check for it in an API function.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b69d920f

KVM: i8254: use atomic_t instead of pit.inject_lock · ddf54503

由 Radim Krčmář 提交于 3月 02, 2016

The lock was an overkill, the same can be done with atomics.

A mb() was added in kvm_pit_ack_irq, to pair with implicit barrier
between pit_timer_fn and pit_do_work.  The mb() prevents a race that
could happen if pending == 0 and irq_ack == 0:

  kvm_pit_ack_irq:                | pit_timer_fn:
   p = atomic_read(&ps->pending); |
                                  |  atomic_inc(&ps->pending);
                                  |  queue_work(pit_do_work);
                                  | pit_do_work:
                                  |  atomic_xchg(&ps->irq_ack, 0);
                                  |  return;
   atomic_set(&ps->irq_ack, 1);   |
   if (p == 0) return;            |

where the interrupt would not be delivered in this tick of pit_timer_fn.
PIT would have eventually delivered the interrupt, but we sacrifice
perofmance to make sure that interrupts are not needlessly delayed.

sfence isn't enough: atomic_dec_if_positive does atomic_read first and
x86 can reorder loads before stores.  lfence isn't enough: store can
pass lfence, turning it into a nop.  A compiler barrier would be more
than enough as CPU needs to stall for unbelievably long to use fences.

This patch doesn't do anything in kvm_pit_reset_reinject, because any
order of resets can race, but the result differs by at most one
interrupt, which is ok, because it's the same result as if the reset
happened at a slightly different time.  (Original code didn't protect
the reset path with a proper lock, so users have to be robust.)
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ddf54503

KVM: i8254: add kvm_pit_reset_reinject · fd700a00

由 Radim Krčmář 提交于 3月 02, 2016

pit_state.pending and pit_state.irq_ack are always reset at the same
time.  Create a function for them.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd700a00

KVM: i8254: simplify atomics in kvm_pit_ack_irq · f6e0a0c1

由 Radim Krčmář 提交于 3月 02, 2016

We already have a helper that does the same thing.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6e0a0c1

KVM: i8254: change PIT discard tick policy · 7dd0fdff

由 Radim Krčmář 提交于 3月 02, 2016

Discard policy uses ack_notifiers to prevent injection of PIT interrupts
before EOI from the last one.

This patch changes the policy to always try to deliver the interrupt,
which makes a difference when its vector is in ISR.
Old implementation would drop the interrupt, but proposed one injects to
IRR, like real hardware would.

The old policy breaks legacy NMI watchdogs, where PIT is used through
virtual wire (LVT0): PIT never sends an interrupt before receiving EOI,
thus a guest deadlock with disabled interrupts will stop NMIs.

Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt
through IOAPIC.  (KVM's PIT is deeply rotten and luckily not used much
in modern systems.)

Even though there is a chance of regressions, I think we can fix the
LVT0 NMI bug without introducing a new tick policy.

Cc: <stable@vger.kernel.org>
Reported-by: NYuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7dd0fdff

03 3月, 2016 11 次提交

KVM: MMU: apply page track notifier · 13d268ca

由 Xiao Guangrong 提交于 2月 24, 2016

Register the notifier to receive write track event so that we can update
our shadow page table

It makes kvm_mmu_pte_write() be the callback of the notifier, no function
is changed
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13d268ca

KVM: MMU: simplify mmu_need_write_protect · 5c520e90

由 Xiao Guangrong 提交于 2月 24, 2016

Now, all non-leaf shadow page are page tracked, if gfn is not tracked
there is no non-leaf shadow page of gfn is existed, we can directly
make the shadow page of gfn to unsync
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c520e90

KVM: MMU: use page track for non-leaf shadow pages · 56ca57f9

由 Xiao Guangrong 提交于 2月 24, 2016

non-leaf shadow pages are always write protected, it can be the user
of page track
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56ca57f9

KVM: page track: add notifier support · 0eb05bf2

由 Xiao Guangrong 提交于 2月 24, 2016

Notifier list is introduced so that any node wants to receive the track
event can register to the list

Two APIs are introduced here:
- kvm_page_track_register_notifier(): register the notifier to receive
  track event

- kvm_page_track_unregister_notifier(): stop receiving track event by
  unregister the notifier

The callback, node->track_write() is called when a write access on the
write tracked page happens
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0eb05bf2

KVM: MMU: clear write-flooding on the fast path of tracked page · e5691a81

由 Xiao Guangrong 提交于 2月 24, 2016

If the page fault is caused by write access on write tracked page, the
real shadow page walking is skipped, we lost the chance to clear write
flooding for the page structure current vcpu is using

Fix it by locklessly waking shadow page table to clear write flooding
on the shadow page structure out of mmu-lock. So that we change the
count to atomic_t
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5691a81

KVM: MMU: let page fault handler be aware tracked page · 3d0c27ad

由 Xiao Guangrong 提交于 2月 24, 2016

The page fault caused by write access on the write tracked page can not
be fixed, it always need to be emulated. page_fault_handle_page_track()
is the fast path we introduce here to skip holding mmu-lock and shadow
page table walking

However, if the page table is not present, it is worth making the page
table entry present and readonly to make the read access happy

mmu_need_write_protect() need to be cooked to avoid page becoming writable
when making page table present or sync/prefetch shadow page table entries
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3d0c27ad

KVM: page track: introduce kvm_slot_page_track_{add,remove}_page · f29d4d78

由 Xiao Guangrong 提交于 2月 24, 2016

These two functions are the user APIs:
- kvm_slot_page_track_add_page(): add the page to the tracking pool
  after that later specified access on that page will be tracked

- kvm_slot_page_track_remove_page(): remove the page from the tracking
  pool, the specified access on the page is not tracked after the last
  user is gone

Both of these are called under the protection both of mmu-lock and
kvm->srcu or kvm->slots_lock
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f29d4d78

KVM: page track: add the framework of guest page tracking · 21ebbeda

由 Xiao Guangrong 提交于 2月 24, 2016

The array, gfn_track[mode][gfn], is introduced in memory slot for every
guest page, this is the tracking count for the gust page on different
modes. If the page is tracked then the count is increased, the page is
not tracked after the count reaches zero

We use 'unsigned short' as the tracking count which should be enough as
shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2
for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for
smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there
is enough room for other trackers

Two callbacks, kvm_page_track_create_memslot() and
kvm_page_track_free_memslot() are implemented in this patch, they are
internally used to initialize and reclaim the memory of the array

Currently, only write track mode is supported
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

21ebbeda

KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect · aeecee2e

由 Xiao Guangrong 提交于 2月 24, 2016

Split rmap_write_protect() and introduce the function to abstract the write
protection based on the slot

This function will be used in the later patch
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aeecee2e

KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage · 547ffaed

由 Xiao Guangrong 提交于 2月 24, 2016

Abstract the common operations from account_shadowed() and
unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage()
and kvm_mmu_gfn_allow_lpage()

These two functions will be used by page tracking in the later patch
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

547ffaed

KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed · 92f94f1e

由 Xiao Guangrong 提交于 2月 24, 2016

kvm_lpage_info->write_count is used to detect if the large page mapping
for the gfn on the specified level is allowed, rename it to disallow_lpage
to reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearer

Later we will extend this mechanism for page tracking: if the gfn is
tracked then large mapping for that gfn on any level is not allowed.
The new name is more straightforward
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

92f94f1e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功