提交 · ffb128c89b77b44da18ccf51844a8e750e2c427a · openanolis / cloud-kernel

14 7月, 2016 3 次提交

kvm: mmu: don't set the present bit unconditionally · ffb128c8

由 Bandan Das 提交于 7月 12, 2016

To support execute only mappings on behalf of L1
hypervisors, we need to teach set_spte() to honor all three of
L1's XWR bits.  As a start, add a new variable "shadow_present_mask"
that will be set for non-EPT shadow paging and clear for EPT.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ffb128c8

kvm: mmu: remove is_present_gpte() · 812f30b2

由 Bandan Das 提交于 7月 12, 2016

We have two versions of the above function.
To prevent confusion and bugs in the future, remove
the non-FNAME version entirely and replace all calls
with the actual check.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

812f30b2

kvm: mmu: extend the is_present check to 32 bits · 8d5cf161

由 Bandan Das 提交于 7月 12, 2016

This is safe because this function is called
on host controlled page table and non-present/non-MMIO
sptes never use bits 1..31. For the EPT case, this
ensures that cases where only the execute bit is set
is marked valid.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8d5cf161

14 6月, 2016 1 次提交

KVM: x86: Fix typos · bb3541f1

由 Andrea Gelmini 提交于 5月 21, 2016

Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb3541f1

02 6月, 2016 1 次提交

KVM: x86: avoid write-tearing of TDP · b19ee2ff

由 Nadav Amit 提交于 5月 11, 2016

In theory, nothing prevents the compiler from write-tearing PTEs, or
split PTE writes. These partially-modified PTEs can be fetched by other
cores and cause mayhem. I have not really encountered such case in
real-life, but it does seem possible.

For example, the compiler may try to do something creative for
kvm_set_pte_rmapp() and perform multiple writes to the PTE.
Signed-off-by: NNadav Amit <nadav.amit@gmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b19ee2ff

06 5月, 2016 1 次提交

mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb

由 Andrea Arcangeli 提交于 5月 05, 2016

After the THP refcounting change, obtaining a compound pages from
get_user_pages() no longer allows us to assume the entire compound page
is immediately mappable from a secondary MMU.

A secondary MMU doesn't want to call get_user_pages() more than once for
each compound page, in order to know if it can map the whole compound
page. So a secondary MMU needs to know from a single get_user_pages()
invocation when it can map immediately the entire compound page to avoid
a flood of unnecessary secondary MMU faults and spurious
atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
users).

Ideally instead of the page->_mapcount < 1 check, get_user_pages()
should return the granularity of the "page" mapping in the "mm" passed
to get_user_pages(). However it's non trivial change to pass the "pmd"
status belonging to the "mm" walked by get_user_pages up the stack (up
to the caller of get_user_pages). So the fix just checks if there is
not a single pte mapping on the page returned by get_user_pages, and in
turn if the caller can assume that the whole compound page is mapped in
the current "mm" (in a pmd_trans_huge()). In such case the entire
compound page is safe to map into the secondary MMU without additional
get_user_pages() calls on the surrounding tail/head pages. In addition
of being faster, not having to run other get_user_pages() calls also
reduces the memory footprint of the secondary MMU fault in case the pmd
split happened as result of memory pressure.

Without this fix after a MADV_DONTNEED (like invoked by QEMU during
postcopy live migration or balloning) or after generic swapping (with a
failure in split_huge_page() that would only result in pmd splitting and
not a physical page split), KVM would map the whole compound page into
the shadow pagetables, despite regular faults or userfaults (like
UFFDIO_COPY) may map regular pages into the primary MMU as result of the
pte faults, leading to the guest mode and userland mode going out of
sync and not working on the same memory at all times.

Any other secondary MMU notifier manager (KVM is just one of the many
MMU notifier users) will need the same information if it doesn't want to
run a flood of get_user_pages_fast and it can support multiple
granularity in the secondary MMU mappings, so I think it is justified to
be exposed not just to KVM.

The other option would be to move transparent_hugepage_adjust to
mm/huge_memory.c but that currently has all kind of KVM data structures
in it, so it's definitely not a cut-and-paste work, so I couldn't do a
fix as cleaner as this one for 4.6.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Li, Liang Z" <liang.z.li@intel.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

127393fb

20 4月, 2016 1 次提交

KVM: MMU: skip obsolete sp in for_each_gfn_*() · 46971a2f

由 Xiao Guangrong 提交于 3月 25, 2016

The obsolete sp should not be used on current vCPUs and should not hurt
vCPU's running, so skip it from for_each_gfn_sp() and
for_each_gfn_indirect_valid_sp()

The side effort is we will double check role.invalid in kvm_mmu_get_page()
but i think it is okay as role is well cached
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46971a2f

01 4月, 2016 1 次提交

kvm: set page dirty only if page has been writable · 14f47605

由 Yu Zhao 提交于 3月 30, 2016

In absence of shadow dirty mask, there is no need to set page dirty
if page has never been writable. This is a tiny optimization but
good to have for people who care much about dirty page tracking.
Signed-off-by: NYu Zhao <yuzhao@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14f47605

31 3月, 2016 1 次提交

x86/cpufeature: Remove cpu_has_gbpages · b8291adc

由 Borislav Petkov 提交于 3月 29, 2016

Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

b8291adc

22 3月, 2016 3 次提交

L
KVM/x86: Replace smp_mb() with smp_store_mb/release() in the walk_shadow_page_lockless_begin/end() · 36ca7e0a
由 Lan Tianyu 提交于 3月 13, 2016
```
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
36ca7e0a

KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page() · 9753f529

由 Lan Tianyu 提交于 3月 13, 2016

There is already a barrier inside of kvm_flush_remote_tlbs() which can
help to make sure everyone sees our modifications to the page tables and
see changes to vcpu->mode here. So remove the smp_mb in the
kvm_mmu_commit_zap_page() and update the comment.
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9753f529

KVM, pkeys: introduce pkru_mask to cache conditions · 2d344105

由 Huaitong Han 提交于 3月 22, 2016

PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some
conditions is true, the fault is considered as a PKU violation.
pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and
does cache some conditions for permission_fault.

[ Huaitong: Xiao helps to modify many sections. ]
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2d344105

10 3月, 2016 1 次提交

KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 · 5f0b8199

由 Paolo Bonzini 提交于 3月 09, 2016

KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
CR0.WP=1. These pages' SPTEs flip continuously between two states:
U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).

When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
When guest EFER has the NX bit cleared, the reserved bit check thinks
that the latter state is invalid; teach it that the smep_andnot_wp case
will also use the NX bit of SPTEs.

Cc: stable@vger.kernel.org
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.inel.com>
Fixes: c258b62bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f0b8199

08 3月, 2016 8 次提交

KVM: MMU: simplify last_pte_bitmap · 6bb69c9b

由 Paolo Bonzini 提交于 2月 23, 2016

Branch-free code is fun and everybody knows how much Avi loves it,
but last_pte_bitmap takes it a bit to the extreme.  Since the code
is simply doing a range check, like

	(level == 1 ||
	 ((gpte & PT_PAGE_SIZE_MASK) && level < N)

we can make it branch-free without storing the entire truth table;
it is enough to cache N.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6bb69c9b

KVM: MMU: coalesce more page zapping in mmu_sync_children · 50c9e6f3

由 Paolo Bonzini 提交于 2月 25, 2016

mmu_sync_children can only process up to 16 pages at a time.  Check
if we need to reschedule, and do not bother zapping the pages until
that happens.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

50c9e6f3

KVM: MMU: move zap/flush to kvm_mmu_get_page · 2a74003a

由 Paolo Bonzini 提交于 2月 24, 2016

kvm_mmu_get_page is the only caller of kvm_sync_page_transient
and kvm_sync_pages.  Moving the handling of the invalid_list there
removes the need for the underdocumented kvm_sync_page_transient
function.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a74003a

KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* · 1f50f1b3

由 Paolo Bonzini 提交于 2月 24, 2016

Return true if the page was synced (and the TLB must be flushed)
and false if the page was zapped.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f50f1b3

KVM: MMU: cleanup __kvm_sync_page and its callers · 9a43c5d9

由 Paolo Bonzini 提交于 2月 24, 2016

Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes
things unnecessarily tricky.  If kvm_mmu_prepare_zap_page is called,
it will call kvm_unlink_unsync_page too.  So kvm_unlink_unsync_page can
be called just as well at the beginning or the end of __kvm_sync_page...
which means that we might do it in kvm_sync_page too and remove the
parameter.

kvm_sync_page ends up being the same code that kvm_sync_pages used
to have before the previous patch.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a43c5d9

KVM: MMU: use kvm_sync_page in kvm_sync_pages · df748f86

由 Paolo Bonzini 提交于 2月 24, 2016

If the last argument is true, kvm_unlink_unsync_page is called anyway in
__kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page
itself).  Therefore, kvm_sync_pages can just call kvm_sync_page, instead
of going through kvm_unlink_unsync_page+__kvm_sync_page.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df748f86

KVM: MMU: move TLB flush out of __kvm_sync_page · 35a70510

由 Paolo Bonzini 提交于 2月 24, 2016

By doing this, kvm_sync_pages can use __kvm_sync_page instead of
reinventing it.  Because of kvm_mmu_flush_or_zap, the code does not
end up being more complex than before, and more cleanups to kvm_sync_pages
will come in the next patches.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35a70510

KVM: MMU: introduce kvm_mmu_flush_or_zap · b8c67b7a

由 Paolo Bonzini 提交于 2月 24, 2016

This is a generalization of mmu_pte_write_flush_tlb, that also
takes care of calling kvm_mmu_commit_zap_page.  The next
patches will introduce more uses.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8c67b7a

04 3月, 2016 3 次提交

KVM: MMU: check kvm_mmu_pages and mmu_page_path indices · e23d3fef

由 Xiao Guangrong 提交于 2月 24, 2016

Give a special invalid index to the root of the walk, so that we
can check the consistency of kvm_mmu_pages and mmu_page_path.
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
[Extracted from a bigger patch proposed by Guangrong. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e23d3fef

KVM: MMU: Fix ubsan warnings · 0a47cd85

由 Paolo Bonzini 提交于 2月 23, 2016

kvm_mmu_pages_init is doing some really yucky stuff. It is setting
up a sentinel for mmu_page_clear_parents; however, because of a) the
way levels are numbered starting from 1 and b) the way mmu_page_path
sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
out of bounds. This is harmless because the code overwrites up to the
first two elements of parents->idx and these are initialized, and
because the sentinel is not needed in this case---mmu_page_clear_parents
exits anyway when it gets to the end of the array. However ubsan
complains, and everyone else should too.

This fix does three things. First it makes the mmu_page_path arrays
PT64_ROOT_LEVEL elements in size, so that we can write to them without
checking the level in advance. Second it disintegrates kvm_mmu_pages_init
between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
for_each_sp (to place the NULL sentinel at the end of the current path).
This is okay because the mmu_page_path is only used in
mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
a for_each_sp iterator, and hence always after a call to mmu_pages_next.
Third it changes mmu_pages_clear_parents to just use the sentinel to
stop iteration, without checking the bounds on level.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a47cd85

KVM: MMU: cleanup handle_abnormal_pfn · 798e88b3

由 Paolo Bonzini 提交于 2月 23, 2016

The goto and temporary variable are unnecessary, just use return
statements.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

798e88b3

03 3月, 2016 8 次提交

KVM: MMU: apply page track notifier · 13d268ca

由 Xiao Guangrong 提交于 2月 24, 2016

Register the notifier to receive write track event so that we can update
our shadow page table

It makes kvm_mmu_pte_write() be the callback of the notifier, no function
is changed
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13d268ca

KVM: MMU: simplify mmu_need_write_protect · 5c520e90

由 Xiao Guangrong 提交于 2月 24, 2016

Now, all non-leaf shadow page are page tracked, if gfn is not tracked
there is no non-leaf shadow page of gfn is existed, we can directly
make the shadow page of gfn to unsync
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c520e90

KVM: MMU: use page track for non-leaf shadow pages · 56ca57f9

由 Xiao Guangrong 提交于 2月 24, 2016

non-leaf shadow pages are always write protected, it can be the user
of page track
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56ca57f9

KVM: MMU: clear write-flooding on the fast path of tracked page · e5691a81

由 Xiao Guangrong 提交于 2月 24, 2016

If the page fault is caused by write access on write tracked page, the
real shadow page walking is skipped, we lost the chance to clear write
flooding for the page structure current vcpu is using

Fix it by locklessly waking shadow page table to clear write flooding
on the shadow page structure out of mmu-lock. So that we change the
count to atomic_t
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5691a81

KVM: MMU: let page fault handler be aware tracked page · 3d0c27ad

由 Xiao Guangrong 提交于 2月 24, 2016

The page fault caused by write access on the write tracked page can not
be fixed, it always need to be emulated. page_fault_handle_page_track()
is the fast path we introduce here to skip holding mmu-lock and shadow
page table walking

However, if the page table is not present, it is worth making the page
table entry present and readonly to make the read access happy

mmu_need_write_protect() need to be cooked to avoid page becoming writable
when making page table present or sync/prefetch shadow page table entries
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3d0c27ad

KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect · aeecee2e

由 Xiao Guangrong 提交于 2月 24, 2016

Split rmap_write_protect() and introduce the function to abstract the write
protection based on the slot

This function will be used in the later patch
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aeecee2e

KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage · 547ffaed

由 Xiao Guangrong 提交于 2月 24, 2016

Abstract the common operations from account_shadowed() and
unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage()
and kvm_mmu_gfn_allow_lpage()

These two functions will be used by page tracking in the later patch
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

547ffaed

KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed · 92f94f1e

由 Xiao Guangrong 提交于 2月 24, 2016

kvm_lpage_info->write_count is used to detect if the large page mapping
for the gfn on the specified level is allowed, rename it to disallow_lpage
to reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearer

Later we will extend this mechanism for page tracking: if the gfn is
tracked then large mapping for that gfn on any level is not allowed.
The new name is more straightforward
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

92f94f1e

24 2月, 2016 1 次提交

x86: Fix misspellings in comments · 6a6256f9

由 Adam Buchbinder 提交于 2月 23, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

6a6256f9

23 2月, 2016 3 次提交

KVM: x86: use list_last_entry · d74c0e6b

由 Geliang Tang 提交于 1月 01, 2016

To make the intention clearer, use list_last_entry instead of
list_entry.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d74c0e6b

KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault() · e9ee956e

由 Takuya Yoshikawa 提交于 2月 22, 2016

Rather than placing a handle_mmio_page_fault() call in each
vcpu->arch.mmu.page_fault() handler, moving it up to
kvm_mmu_page_fault() makes the code better:

 - avoids code duplication
 - for kvm_arch_async_page_ready(), which is the other caller of
   vcpu->arch.mmu.page_fault(), removes an extra error_code check
 - avoids returning both RET_MMIO_PF_* values and raw integer values
   from vcpu->arch.mmu.page_fault()
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9ee956e

KVM: x86: MMU: Consolidate quickly_check_mmio_pf() and is_mmio_page_fault() · ded58749

由 Takuya Yoshikawa 提交于 2月 22, 2016

These two have only slight differences:
 - whether 'addr' is of type u64 or of type gva_t
 - whether they have 'direct' parameter or not

Concerning the former, quickly_check_mmio_pf()'s u64 is better because
'addr' needs to be able to have both a guest physical address and a
guest virtual address.

The latter is just a stylistic issue as we can always calculate the mode
from the 'vcpu' as is_mmio_page_fault() does.  This patch keeps the
parameter to make the following patch cleaner.

In addition, the patch renames the function to mmio_info_in_cache() to
make it clear what it actually checks for.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ded58749

16 1月, 2016 1 次提交

kvm: rename pfn_t to kvm_pfn_t · ba049e93

由 Dan Williams 提交于 1月 15, 2016

To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persistent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver.  The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.

The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array.  Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory.  The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.

This patch (of 18):

The core has developed a need for a "pfn_t" type [1].  Move the existing
pfn_t in KVM to kvm_pfn_t [2].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba049e93

07 1月, 2016 1 次提交

kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa · 0af2593b

由 David Matlack 提交于 12月 30, 2015

The comment had the meaning of mmu.gva_to_gpa and nested_mmu.gva_to_gpa
swapped. Fix that, and also add some details describing how each translation
works.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0af2593b

19 12月, 2015 1 次提交

KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table() · 77492664

由 Takuya Yoshikawa 提交于 12月 18, 2015

Not just in order to clean up the code, but to make it faster by using
enhanced instructions: the initialization became 20-30% faster on our
testing machine.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77492664

26 11月, 2015 1 次提交
- T
  KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() · bb11c6c9
  由 Takuya Yoshikawa 提交于 11月 26, 2015
```
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
  bb11c6c9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功