提交 · 97f4533a60ce5d0cb35ff44a190111f81a987620 · openeuler / Kernel

04 12月, 2014 5 次提交

xen: Delay m2p_override initialization · 97f4533a

由 Juergen Gross 提交于 11月 28, 2014

The m2p overrides are used to be able to find the local pfn for a
foreign mfn mapped into the domain. They are used by driver backends
having to access frontend data.

As this functionality isn't used in early boot it makes no sense to
initialize the m2p override functions very early. It can be done
later without doing any harm, removing the need for allocating memory
via extend_brk().

While at it make some m2p override functions static as they are only
used internally.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

97f4533a

xen: Delay remapping memory of pv-domain · 1f3ac86b

由 Juergen Gross 提交于 11月 28, 2014

Early in the boot process the memory layout of a pv-domain is changed
to match the E820 map (either the host one for Dom0 or the Xen one)
regarding placement of RAM and PCI holes. This requires removing memory
pages initially located at positions not suitable for RAM and adding
them later at higher addresses where no restrictions apply.

To be able to operate on the hypervisor supported p2m list until a
virtual mapped linear p2m list can be constructed, remapping must
be delayed until virtual memory management is initialized, as the
initial p2m list can't be extended unlimited at physical memory
initialization time due to it's fixed structure.

A further advantage is the reduction in complexity and code volume as
we don't have to be careful regarding memory restrictions during p2m
updates.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

1f3ac86b

xen: use common page allocation function in p2m.c · 7108c9ce

由 Juergen Gross 提交于 11月 28, 2014

In arch/x86/xen/p2m.c three different allocation functions for
obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
or __get_free_page().  Which of those functions is used depends on the
progress of the boot process of the system.

Introduce a common allocation routine selecting the to be called
allocation routine dynamically based on the boot progress. This allows
moving initialization steps without having to care about changing
allocation calls.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

7108c9ce

xen: Make functions static · 820c4db2

由 Juergen Gross 提交于 11月 28, 2014

Some functions in arch/x86/xen/p2m.c are used locally only. Make them
static. Rearrange the functions in p2m.c to avoid forward declarations.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

820c4db2

xen: fix some style issues in p2m.c · 6f58d89e

由 Juergen Gross 提交于 11月 28, 2014

The source arch/x86/xen/p2m.c has some coding style issues. Fix them.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

6f58d89e

23 10月, 2014 3 次提交

x86/xen: avoid race in p2m handling · 3a0e94f8

由 Juergen Gross 提交于 10月 17, 2014

When a new p2m leaf is allocated this leaf is linked into the p2m tree
via cmpxchg. Unfortunately the compare value for checking the success
of the update is read after checking for the need of a new leaf. It is
possible that a new leaf has been linked into the tree concurrently
in between. This could lead to a leaked memory page and to the loss of
some p2m entries.

Avoid the race by using the read compare value for checking the need
of a new p2m leaf and use ACCESS_ONCE() to get it.

There are other places which seem to need ACCESS_ONCE() to ensure
proper operation. Change them accordingly.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

3a0e94f8

x86/xen: delay construction of mfn_list_list · 2c185687

由 Juergen Gross 提交于 10月 14, 2014

The 3 level p2m tree for the Xen tools is constructed very early at
boot by calling xen_build_mfn_list_list(). Memory needed for this tree
is allocated via extend_brk().

As this tree (other than the kernel internal p2m tree) is only needed
for domain save/restore, live migration and crash dump analysis it
doesn't matter whether it is constructed very early or just some
milliseconds later when memory allocation is possible by other means.

This patch moves the call of xen_build_mfn_list_list() just after
calling xen_pagetable_p2m_copy() simplifying this function, too, as it
doesn't have to bother with two parallel trees now. The same applies
for some other internal functions.

While simplifying code, make early_can_reuse_p2m_middle() static and
drop the unused second parameter. p2m_mid_identity_mfn can be removed
as well, it isn't used either.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

2c185687

x86/xen: avoid writing to freed memory after race in p2m handling · 239af7c7

由 Juergen Gross 提交于 10月 14, 2014

In case a race was detected during allocation of a new p2m tree
element in alloc_p2m() the new allocated mid_mfn page is freed without
updating the pointer to the found value in the tree. This will result
in overwriting the just freed page with the mfn of the p2m leaf.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

239af7c7

23 9月, 2014 1 次提交

xen/setup: Remap Xen Identity Mapped RAM · 4fbb67e3

由 Matt Rushton 提交于 8月 11, 2014

Instead of ballooning up and down dom0 memory this remaps the existing mfns
that were replaced by the identity map. The reason for this is that the
existing implementation ballooned memory up and and down which caused dom0
to have discontiguous pages. In some cases this resulted in the use of bounce
buffers which reduced network I/O performance significantly. This change will
honor the existing order of the pages with the exception of some boundary
conditions.

To do this we need to update both the Linux p2m table and the Xen m2p table.
Particular care must be taken when updating the p2m table since it's important
to limit table memory consumption and reuse the existing leaf pages which get
freed when an entire leaf page is set to the identity map. To implement this,
mapping updates are grouped into blocks with table entries getting cached
temporarily and then released.

On my test system before:
Total pages: 2105014
Total contiguous: 1640635

After:
Total pages: 2105014
Total contiguous: 2098904
Signed-off-by: NMatthew Rushton <mrushton@amazon.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

4fbb67e3

01 8月, 2014 1 次提交

xen/setup: Remove Identity Map Debug Message · 3fbdf631

由 Matt Rushton 提交于 7月 19, 2014

Removing a debug message for setting the identity map since it becomes
rather noisy after rework of the identity map code.
Signed-off-by: NMatthew Rushton <mrushton@amazon.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

3fbdf631

15 5月, 2014 4 次提交

x86/xen: set regions above the end of RAM as 1:1 · 25b884a8

由 David Vrabel 提交于 1月 03, 2014

PCI devices may have BARs located above the end of RAM so mark such
frames as identity frames in the p2m (instead of the default of
missing).

PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be
identity frames for the same reason.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

25b884a8

x86/xen: compactly store large identity ranges in the p2m · 3cb83e46

由 David Vrabel 提交于 1月 07, 2014

Large (multi-GB) identity ranges currently require a unique middle page
(filled with p2m_identity entries) per 1 GB region.

Similar to the common p2m_mid_missing middle page for large missing
regions, introduce a p2m_mid_identity page (filled with p2m_identity
entries) which can be used instead.

set_phys_range_identity() thus only needs to allocate new middle pages
at the beginning and end of the range.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3cb83e46

x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN · a9b5bff6

由 David Vrabel 提交于 1月 06, 2014

Allow set_phys_range_identity() to work with a range that overlaps
MAX_P2M_PFN by clamping pfn_e to MAX_P2M_PFN.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a9b5bff6

x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle() · fcca2e31

由 David Vrabel 提交于 1月 07, 2014

early_p2m_alloc_middle() allocates a new leaf page and
early_p2m_alloc() allocates a new middle page.  This is confusing.

Swap the names so they match what the functions actually do.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fcca2e31

18 3月, 2014 1 次提交

xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override · 1429d46d

由 Zoltan Kiss 提交于 2月 27, 2014

The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the bulk of the original function (everything after the mapping hypercall)
  is moved to arch-dependent set/clear_foreign_p2m_mapping
- the "if (xen_feature(XENFEAT_auto_translated_physmap))" branch goes to ARM
- therefore the ARM function could be much smaller, the m2p_override stubs
  could be also removed
- on x86 the set_phys_to_machine calls were moved up to this new funcion
  from m2p_override functions
- and m2p_override functions are only called when there is a kmap_ops param

It also removes a stray space from arch/x86/include/asm/xen/page.h.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: NAnthony Liguori <aliguori@amazon.com>
Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
Suggested-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

1429d46d

03 2月, 2014 1 次提交

Revert "xen/grant-table: Avoid m2p_override during mapping" · e85fc980

由 Konrad Rzeszutek Wilk 提交于 2月 03, 2014

This reverts commit 08ece5bb.

As it breaks ARM builds and needs more attention
on the ARM side.
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

e85fc980

31 1月, 2014 1 次提交

xen/grant-table: Avoid m2p_override during mapping · 08ece5bb

由 Zoltan Kiss 提交于 1月 23, 2014

The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the original functions were renamed to __gnttab_[un]map_refs, with a new
  parameter m2p_override
- based on m2p_override either they follow the original behaviour, or just set
  the private flag and call set_phys_to_machine
- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
  m2p_override false
- a new function gnttab_[un]map_refs_userspace provides the old behaviour

It also removes a stray space from page.h and change ret to 0 if
XENFEAT_auto_translated_physmap, as that is the only possible return value
there.

v2:
- move the storing of the old mfn in page->index to gnttab_map_refs
- move the function header update to a separate patch

v3:
- a new approach to retain old behaviour where it needed
- squash the patches into one

v4:
- move out the common bits from m2p* functions, and pass pfn/mfn as parameter
- clear page->private before doing anything with the page, so m2p_find_override
  won't race with this

v5:
- change return value handling in __gnttab_[un]map_refs
- remove a stray space in page.h
- add detail why ret = 0 now at some places

v6:
- don't pass pfn to m2p* functions, just get it locally
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

08ece5bb

06 1月, 2014 2 次提交

xen/pvh: Setup up shared_info. · 4dd322bc

由 Mukesh Rathor 提交于 12月 31, 2013

For PVHVM the shared_info structure is provided via the same way
as for normal PV guests (see include/xen/interface/xen.h).

That is during bootup we get 'xen_start_info' via the %esi register
in startup_xen. Then later we extract the 'shared_info' from said
structure (in xen_setup_shared_info) and start using it.

The 'xen_setup_shared_info' is all setup to work with auto-xlat
guests, but there are two functions which it calls that are not:
xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
This patch modifies the P2M code (xen_setup_mfn_list_list)
while the "Piggyback on PVHVM for event channels" modifies
the xen_setup_vcpu_info_placement.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4dd322bc

xen/pvh: Don't setup P2M tree. · 696fd7c5

由 Konrad Rzeszutek Wilk 提交于 12月 15, 2013

P2M is not available for PVH. Fortunatly for us the
P2M code already has mostly the support for auto-xlat guest thanks to
commit 3d24bbd7
"grant-table: call set_phys_to_machine after mapping grant refs"
which: "
introduces set_phys_to_machine calls for auto_translated guests
(even on x86) in gnttab_map_refs and gnttab_unmap_refs.
translated by swiotlb-xen... " so we don't need to muck much.

with above mentioned "commit you'll get set_phys_to_machine calls
from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
anything with them " (Stefano Stabellini) which is OK - we want
them to be NOPs.

This is because we assume that an "IOMMU is always present on the
plaform and Xen is going to make the appropriate IOMMU pagetable
changes in the hypercall implementation of GNTTABOP_map_grant_ref
and GNTTABOP_unmap_grant_ref, then eveything should be transparent
from PVH priviligied point of view and DMA transfers involving
foreign pages keep working with no issues[sp]

Otherwise we would need a P2M (and an M2P) for PVH priviligied to
track these foreign pages .. (see arch/arm/xen/p2m.c)."
(Stefano Stabellini).

We still have to inhibit the building of the P2M tree.
That had been done in the past by not calling
xen_build_dynamic_phys_to_machine (which setups the P2M tree
and gives us virtual address to access them). But we are missing
a check for xen_build_mfn_list_list - which was continuing to setup
the P2M tree and would blow up at trying to get the virtual
address of p2m_missing (which would have been setup by
xen_build_dynamic_phys_to_machine).

Hence a check is needed to not call xen_build_mfn_list_list when
running in auto-xlat mode.

Instead of replicating the check for auto-xlat in enlighten.c
do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
is called also in xen_arch_post_suspend without any checks for
auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
allocate space for an P2M tree.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

696fd7c5

10 10月, 2013 1 次提交

xen/x86: allow __set_phys_to_machine for autotranslate guests · 2f558d40

由 Stefano Stabellini 提交于 10月 09, 2013

Allow __set_phys_to_machine to be called for autotranslate guests.
It can be used to keep track of phys_to_machine changes, however we
don't do anything with the information at the moment.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

2f558d40

25 9月, 2013 1 次提交

xen/p2m: check MFN is in range before using the m2p table · 0160676b

由 David Vrabel 提交于 9月 13, 2013

On hosts with more than 168 GB of memory, a 32-bit guest may attempt
to grant map an MFN that is error cannot lookup in its mapping of the
m2p table.  There is an m2p lookup as part of m2p_add_override() and
m2p_remove_override().  The lookup falls off the end of the mapped
portion of the m2p and (because the mapping is at the highest virtual
address) wraps around and the lookup causes a fault on what appears to
be a user space address.

do_page_fault() (thinking it's a fault to a userspace address), tries
to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
already locked.  do_page_fault() then deadlocks.

The deadlock would most commonly occur when a 64-bit guest is started
and xenconsoled attempts to grant map its console ring.

Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
mapped portion of the m2p table before accessing the table and use
this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
(which already had the correct range check).

All faults caused by accessing the non-existant parts of the m2p are
thus within the kernel address space and exception_fixup() is called
without trying to lock mm->mmap_sem.

This means that for MFNs that are outside the mapped range of the m2p
then mfn_to_pfn() will always look in the m2p overrides.  This is
correct because it must be a foreign MFN (and the PFN in the m2p in
this case is only relevant for the other domain).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
--
v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
    range as it's probably foreign.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

0160676b

09 9月, 2013 1 次提交

xen/p2m: Don't call get_balloon_scratch_page() twice, keep interrupts disabled for multicalls · d7f8f48d

由 Boris Ostrovsky 提交于 9月 09, 2013

m2p_remove_override() calls get_balloon_scratch_page() in
MULTI_update_va_mapping() even though it already has pointer to this page from
the earlier call (in scratch_page). This second call doesn't have a matching
put_balloon_scratch_page() thus not restoring preempt count back. (Also, there
is no put_balloon_scratch_page() in the error path.)

In addition, the second multicall uses __xen_mc_entry() which does not disable
interrupts. Rearrange xen_mc_* calls to keep interrupts off while performing
multicalls.

This commit fixes a regression introduced by:

commit ee072640
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Tue Jul 23 17:23:54 2013 +0000

    xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

d7f8f48d

20 8月, 2013 1 次提交

xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping · ee072640

由 Stefano Stabellini 提交于 7月 23, 2013

GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0
mapping instead of reinstating the original mapping.
Doing so separately would be racy.

To unmap a grant and reinstate the original mapping atomically we use
GNTTABOP_unmap_and_replace.
GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so
don't use it for kmaps.  GNTTABOP_unmap_and_replace zeroes the mapping
passed in new_addr so we have to reinstate it, however that is a
per-cpu mapping only used for balloon scratch pages, so we can be sure that
it's not going to be accessed while the mapping is not valid.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: alex@alex.org.uk
CC: dcrisan@flexiant.com

[v1: Konrad fixed up the conflicts]
Conflicts:
	arch/x86/xen/p2m.c

ee072640

09 8月, 2013 1 次提交

xen/p2m: avoid unneccesary TLB flush in m2p_remove_override() · 65a45fa2

由 David Vrabel 提交于 7月 19, 2013

In m2p_remove_override() when removing the grant map from the kernel
mapping and replacing with a mapping to the original page, the grant
unmap will already have flushed the TLB and it is not necessary to do
it again after updating the mapping.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

65a45fa2

12 9月, 2012 1 次提交

xen/m2p: do not reuse kmap_op->dev_bus_addr · 2fc136ee

由 Stefano Stabellini 提交于 9月 12, 2012

If the caller passes a valid kmap_op to m2p_add_override, we use
kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
part of the interface with Xen and if we are batching the hypercalls it
might not have been written by the hypervisor yet. That means that later
on Xen will write to it and we'll think that the original mfn is
actually what Xen has written to it.

Rather than "stealing" struct members from kmap_op, keep using
page->index to store the original mfn and add another parameter to
m2p_remove_override to get the corresponding kmap_op instead.
It is now responsibility of the caller to keep track of which kmap_op
corresponds to a particular page in the m2p_override (gntdev, the only
user of this interface that passes a valid kmap_op, is already doing that).

CC: stable@kernel.org
Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

2fc136ee

05 9月, 2012 1 次提交

xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041

由 Konrad Rzeszutek Wilk 提交于 9月 04, 2012

We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
inclusive) when trying to figure out whether we can re-use some of the
P2M middle leafs.

Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
we would try to use the 512th entry. Fortunately for us the p2m_top_index
has a check for this:

 BUG_ON(pfn >= MAX_P2M_PFN);

which we hit and saw this:

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff819cadeb>]
(XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
(XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
(XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
(XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
(XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
(XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000661795000   cr2: 0000000000000000

Fixes-Oracle-Bug: 14570662
CC: stable@vger.kernel.org # only for v3.5
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

50e90041

23 8月, 2012 3 次提交

xen/p2m: When revectoring deal with holes in the P2M array. · 3fc509fc

由 Konrad Rzeszutek Wilk 提交于 8月 16, 2012

When we free the PFNs and then subsequently populate them back
during bootup:

Freeing 20000-20200 pfn range: 512 pages freed
1-1 mapping on 20000->20200
Freeing 40000-40200 pfn range: 512 pages freed
1-1 mapping on 40000->40200
Freeing bad80-badf4 pfn range: 116 pages freed
1-1 mapping on bad80->badf4
Freeing badf6-bae7f pfn range: 137 pages freed
1-1 mapping on badf6->bae7f
Freeing bb000-100000 pfn range: 282624 pages freed
1-1 mapping on bb000->100000
Released 283999 pages of unused memory
Set 283999 page(s) to 1-1 mapping
Populating 1acb8a-1f20e9 pfn range: 283999 pages added

We end up having the P2M array (that is the one that was
grafted on the P2M tree) filled with IDENTITY_FRAME or
INVALID_P2M_ENTRY) entries. The patch titled

"xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
recycles said slots and replaces the P2M tree leaf's with
 &mfn_list[xx] with p2m_identity or p2m_missing.

And re-uses the P2M array sections for other P2M tree leaf's.
For the above mentioned bootup excerpt, the PFNs at
0x20000->0x20200 are going to be IDENTITY based:

P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.

We can re-use that and replace P2M[0][256] to point to p2m_identity.
The "old" page (the grafted P2M array provided by Xen) that was at
P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
b/c when we populate back:

Populating 1acb8a-1f20e9 pfn range: 283999 pages added

we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
the new MFNs.

That is all OK, except when we revector we assume that the PFN
count would be the same in the grafted P2M array and in the
newly allocated. Since that is no longer the case, as we have
holes in the P2M that point to p2m_missing or p2m_identity we
have to take that into account.

[v2: Check for overflow]
[v3: Move within the __va check]
[v4: Fix the computation]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3fc509fc

xen/p2m: Add logic to revector a P2M tree to use __va leafs. · 357a3cfb

由 Konrad Rzeszutek Wilk 提交于 7月 19, 2012

During bootup Xen supplies us with a P2M array. It sticks
it right after the ramdisk, as can be seen with a 128GB PV guest:

(certain parts removed for clarity):
xc_dom_build_image: called
xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000

So the physical memory and virtual (using __START_KERNEL_map addresses)
layout looks as so:

  phys                             __ka
/------------\                   /-------------------\
| 0          | empty             | 0xffffffff80000000|
| ..         |                   | ..                |
| 16MB       | <= kernel starts  | 0xffffffff81000000|
| ..         |                   |                   |
| 30MB       | <= kernel ends => | 0xffffffff81e43000|
| ..         |  & ramdisk starts | ..                |
| 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
| ..         |  & P2M starts     | ..                |
| ..         |                   | ..                |
| 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
| ..         | start_info        | 0xffffffffa25c7000|
| ..         | xenstore          | 0xffffffffa25c8000|
| ..         | cosole            | 0xffffffffa25c9000|
| 549MB      | <= page tables => | 0xffffffffa25ca000|
| ..         |                   |                   |
| 550MB      | <= PGT end     => | 0xffffffffa26e1000|
| ..         | boot stack        |                   |
\------------/                   \-------------------/

As can be seen, the ramdisk, P2M and pagetables are taking
a bit of __ka addresses space. Which is a problem since the
MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
right in there! This results during bootup with the inability to
load modules, with this error:

------------[ cut here ]------------
WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
Call Trace:
 [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c6186>] ? load_module+0x66/0x19c0
 [<ffffffff8105cadc>] module_alloc+0x5c/0x60
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
---[ end trace fd8f7704fdea0291 ]---
vmalloc: allocation failure, allocated 16384 of 20480 bytes
modprobe: page allocation failure: order:0, mode:0xd2

Since the __va and __ka are 1:1 up to MODULES_VADDR and
cleanup_highmap rids __ka of the ramdisk mapping, what
we want to do is similar - get rid of the P2M in the __ka
address space. There are two ways of fixing this:

 1) All P2M lookups instead of using the __ka address would
    use the __va address. This means we can safely erase from
    __ka space the PMD pointers that point to the PFNs for
    P2M array and be OK.
 2). Allocate a new array, copy the existing P2M into it,
    revector the P2M tree to use that, and return the old
    P2M to the memory allocate. This has the advantage that
    it sets the stage for using XEN_ELF_NOTE_INIT_P2M
    feature. That feature allows us to set the exact virtual
    address space we want for the P2M - and allows us to
    boot as initial domain on large machines.

So we pick option 2).

This patch only lays the groundwork in the P2M code. The patch
that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

357a3cfb

Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b

由 Konrad Rzeszutek Wilk 提交于 8月 22, 2012

Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."

This reverts commit 806c312e and
commit 59b29440.

And also documents setup.c and why we want to do it that way, which
is that we tried to make the the memblock_reserve more selective so
that it would be clear what region is reserved. Sadly we ran
in the problem wherein on a 64-bit hypervisor with a 32-bit
initial domain, the pt_base has the cr3 value which is not
neccessarily where the pagetable starts! As Jan put it: "
Actually, the adjustment turns out to be correct: The page
tables for a 32-on-64 dom0 get allocated in the order "first L1",
"first L2", "first L3", so the offset to the page table base is
indeed 2. When reading xen/include/public/xen.h's comment
very strictly, this is not a violation (since there nothing is said
that the first thing in the page table space is pointed to by
pt_base; I admit that this seems to be implied though, namely
do I think that it is implied that the page table space is the
range [pt_base, pt_base + nt_pt_frames), whereas that
range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
which - without a priori knowledge - the kernel would have
difficulty to figure out)." - so lets just fall back to the
easy way and reserve the whole region.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

51faaf2b

22 8月, 2012 2 次提交

xen/x86: Use memblock_reserve for sensitive areas. · 59b29440

由 Konrad Rzeszutek Wilk 提交于 7月 19, 2012

instead of a big memblock_reserve. This way we can be more
selective in freeing regions (and it also makes it easier
to understand where is what).

[v1: Move the auto_translate_physmap to proper line]
[v2: Per Stefano suggestion add more comments]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

59b29440

xen/p2m: Fix the comment describing the P2M tree. · a3118beb

由 Konrad Rzeszutek Wilk 提交于 6月 28, 2012

It mixed up the p2m_mid_missing with p2m_missing. Also
remove some extra spaces.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a3118beb

17 8月, 2012 1 次提交

xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID. · 250a41e0

由 Konrad Rzeszutek Wilk 提交于 8月 17, 2012

If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
with either a p2m_missing or p2m_identity respectively. The old
page (which was created via extend_brk or was grafted on from the
mfn_list) can be re-used for setting new PFNs.

This also means we can remove git commit:
5bc6f988
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
which tried to fix this.

and make the amount that is required to be reserved much smaller.

CC: stable@vger.kernel.org # for 3.5 only.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

250a41e0

02 8月, 2012 1 次提交

xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. · 5bc6f988

由 Konrad Rzeszutek Wilk 提交于 7月 30, 2012

When we release pages back during bootup:

Freeing  9d-100 pfn range: 99 pages freed
Freeing  9cf36-9d0d2 pfn range: 412 pages freed
Freeing  9f6bd-9f6bf pfn range: 2 pages freed
Freeing  9f714-9f7bf pfn range: 171 pages freed
Freeing  9f7e0-9f7ff pfn range: 31 pages freed
Freeing  9f800-100000 pfn range: 395264 pages freed
Released 395979 pages of unused memory

We then try to populate those pages back. In the P2M tree however
the space for those leafs must be reserved - as such we use extend_brk.
We reserve 8MB of _brk space, which means we can fit over
1048576 PFNs - which is more than we should ever need.

Without this, on certain compilation of the kernel we would hit:

(XEN) domain_crash_sync called from entry.S
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff818aad3b>]
(XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffffff81a7c000   rbx: 000000000000003d   rcx: 0000000000001000
(XEN) rdx: ffffffff81a7b000   rsi: 0000000000001000   rdi: 0000000000001000
(XEN) rbp: ffffffff81801cd8   rsp: ffffffff81801c98   r8:  0000000000100000
(XEN) r9:  ffffffff81a7a000   r10: 0000000000000001   r11: 0000000000000003
(XEN) r12: 0000000000000004   r13: 0000000000000004   r14: 000000000000003d
(XEN) r15: 00000000000001e8   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 0000000125803000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81801c98:

.. which is extend_brk hitting a BUG_ON.

Interestingly enough, most of the time we are not going to hit this
b/c the _brk space is quite large (v3.5):
 ffffffff81a25000 B __brk_base
 ffffffff81e43000 B __brk_limit
= ~4MB.

vs earlier kernels (with this back-ported), the space is smaller:
 ffffffff81a25000 B __brk_base
 ffffffff81a7b000 B __brk_limit
= 344 kBytes.

where we would certainly hit this and hit extend_brk.

Note that git commit c3d93f88
(xen: populate correct number of pages when across mem boundary (v2))
exposed this bug).

[v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]

CC: stable@vger.kernel.org #only for 3.5
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

5bc6f988

15 6月, 2012 1 次提交

xen: mark local pages as FOREIGN in the m2p_override · b9e0d95c

由 Stefano Stabellini 提交于 5月 23, 2012

When the frontend and the backend reside on the same domain, even if we
add pages to the m2p_override, these pages will never be returned by
mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
always fail, so the pfn of the frontend will be returned instead
(resulting in a deadlock because the frontend pages are already locked).

INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-system-i38 D ffff8800cfc137c0     0  1085      1 0x00000000
 ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
 ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
 ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
Call Trace:
 [<ffffffff81101ee0>] ? __lock_page+0x70/0x70
 [<ffffffff81a0fdd9>] schedule+0x29/0x70
 [<ffffffff81a0fe80>] io_schedule+0x60/0x80
 [<ffffffff81101eee>] sleep_on_page+0xe/0x20
 [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff81101ed7>] __lock_page+0x67/0x70
 [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811867e6>] ? bio_add_page+0x36/0x40
 [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
 [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
 [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
 [<ffffffff81104027>] generic_file_aio_read+0x737/0x780
 [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
 [<ffffffff811038f0>] ? find_get_pages+0x150/0x150
 [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
 [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
 [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
 [<ffffffff811998b8>] do_io_submit+0x708/0xb90
 [<ffffffff81199d50>] sys_io_submit+0x10/0x20
 [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b

The explanation is in the comment within the code:

We need to do this because the pages shared by the frontend
(xen-blkfront) can be already locked (lock_page, called by
do_read_cache_page); when the userspace backend tries to use them
with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
do_blockdev_direct_IO is going to try to lock the same pages
again resulting in a deadlock.

A simplified call graph looks like this:

pygrub                          QEMU
-----------------------------------------------
do_read_cache_page              io_submit
  |                              |
lock_page                       ext3_direct_IO
                                 |
                                bio_add_page
                                 |
                                lock_page

Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
a 'struct page' to have a different MFN (so that it can point to another
guest). It also can easily find out whether another pfn corresponding
to the mfn exists in the m2p, and can set the FOREIGN bit
in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.

This allows the backend to perform direct_IO on these pages, but as a
side effect prevents the frontend from using get_user_pages_fast on
them while they are being shared with the backend.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b9e0d95c

20 4月, 2012 1 次提交

Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" · 3d81acb1

由 Konrad Rzeszutek Wilk 提交于 4月 20, 2012

This reverts commit b960d6c4.

If we have another thread (very likely) touched the list, we
end up hitting a problem "that the next element is wrong because
we should be able to cope with that. The problem is that the
next->next pointer would be set LIST_POISON1. " (Stefano's
comment on the patch).

Reverting for now.
Suggested-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3d81acb1

18 4月, 2012 1 次提交

xen/p2m: m2p_find_override: use list_for_each_entry_safe · b960d6c4

由 Stefano Stabellini 提交于 3月 27, 2012

Use list_for_each_entry_safe and remove the spin_lock acquisition in
m2p_find_override.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b960d6c4

07 4月, 2012 4 次提交

xen/p2m: An early bootup variant of set_phys_to_machine · 940713bb

由 Konrad Rzeszutek Wilk 提交于 3月 30, 2012

During early bootup we can't use alloc_page, so to allocate
leaf pages in the P2M we need to use extend_brk. For that
we are utilizing the early_alloc_p2m and early_alloc_p2m_middle
functions to do the job for us. This function follows the
same logic as set_phys_to_machine.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

940713bb

xen/p2m: Collapse early_alloc_p2m_middle redundant checks. · d5096850

由 Konrad Rzeszutek Wilk 提交于 3月 30, 2012

At the start of the function we were checking for idx != 0
and bailing out. And later calling extend_brk if idx != 0.

That is unnecessary so remove that checks.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d5096850

xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument · cef4cca5

由 Konrad Rzeszutek Wilk 提交于 3月 30, 2012

For identity cases we want to call reserve_brk only on the boundary
conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is
to work around identify regions (PCI spaces, gaps in E820) which are not
aligned on 2MB regions.

However for the case were we want to allocate P2M middle leafs at the
early bootup stage, irregardless of this alignment check we need some
means of doing that. For that we provide the new argument.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cef4cca5

xen/p2m: Move code around to allow for better re-usage. · 3f3aaea2

由 Konrad Rzeszutek Wilk 提交于 3月 30, 2012

We are going to be using the early_alloc_p2m (and
early_alloc_p2m_middle) code in follow up patches which
are not related to setting identity pages.

Hence lets move the code out in its own function and
rename them as appropiate.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3f3aaea2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功