提交 · c2607f74aad96d18316a6e709b40e0ffe9def148 · openeuler / raspberrypi-kernel

29 1月, 2015 2 次提交

ARM: 8294/1: ATAG_DTB_COMPAT: remove the DT workspace's hardcoded 64KB size · c2607f74

由 Nicolas Pitre 提交于 1月 27, 2015

There is currently a hardcoded limit of 64KB for the DTB to live in and
be extended with ATAG info.  Some DTBs have outgrown that limit:

$ du -b arch/arm/boot/dts/omap3-n900.dtb
70212   arch/arm/boot/dts/omap3-n900.dtb

Furthermore, the actual size passed to atags_to_fdt() included the stack
size which is obviously wrong.

The initial DTB size is known, so use it to size the allocated workspace
with a 50% growth assumption and relocate the temporary stack above that.
This is also clamped to 32KB min / 1MB max for robustness against bad
DTB data.
Reported-by: NPali Rohár <pali.rohar@gmail.com>
Tested-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c2607f74

ARM: 8288/1: dma-mapping: don't detach devices without an IOMMU during teardown · c2273a18

由 Will Deacon 提交于 1月 16, 2015

When tearing down the DMA ops for a device via of_dma_deconfigure, we
unconditionally detach the device from its IOMMU domain. For devices
that aren't actually behind an IOMMU, this produces a "Not attached"
warning message on the console.

This patch changes the teardown code so that we don't detach from the
IOMMU domain when there isn't an IOMMU dma mapping to start with.
Reported-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c2273a18

21 1月, 2015 1 次提交

ARM: 8292/1: mm: fix size rounding-down of arm_add_memory() function · 909ba297

由 Masahiro Yamada 提交于 1月 20, 2015

The current rounding of "size" is wrong:

 - If "start" is sufficiently near the next page boundary, "size"
   is decremented by more than enough and the last page is lost.

 - If "size" is sufficiently small, it is wrapped around and gets
   a bogus value.
Signed-off-by: NMasahiro Yamada <yamada.m@jp.panasonic.com>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

909ba297

13 1月, 2015 2 次提交

ARM: 8255/1: perf: Prevent wraparound during overflow · 2d9ed740

由 Daniel Thompson 提交于 1月 05, 2015

If the overflow threshold for a counter is set above or near the
0xffffffff boundary then the kernel may lose track of the overflow
causing only events that occur *after* the overflow to be recorded.
Specifically the problem occurs when the value of the performance counter
overtakes its original programmed value due to wrap around.

Typical solutions to this problem are either to avoid programming in
values likely to be overtaken or to treat the overflow bit as the 33rd
bit of the counter.

Its somewhat fiddly to refactor the code to correctly handle the 33rd bit
during irqsave sections (context switches for example) so instead we take
the simpler approach of avoiding values likely to be overtaken.

We set the limit to half of max_period because this matches the limit
imposed in __hw_perf_event_init(). This causes a doubling of the interrupt
rate for large threshold values, however even with a very fast counter
ticking at 4GHz the interrupt rate would only be ~1Hz.
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

2d9ed740

ARM: 8266/1: Remove early stack deallocation from restore_user_regs · a18f3645

由 Daniel Thompson 提交于 1月 09, 2015

Currently restore_user_regs deallocates the SVC stack early in
its execution and relies on no exception being taken between
the deallocation and the registers being restored. The introduction
of a default FIQ handler that also uses the SVC stack breaks this
assumption and can result in corrupted register state.

This patch works around the problem by removing the early
stack deallocation and using r2 as a temporary instead. I have
not found a way to do this without introducing an extra mov
instruction to the macro.
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

a18f3645

10 1月, 2015 1 次提交

ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error · 1e347922

由 Victor Kamensky 提交于 1月 09, 2015

In v3.19-rc3 tree when CONFIG_ARM_LPAE and CONFIG_DEBUG_RODATA are enabled
image failed to compile with the following error:

arch/arm/mm/init.c:661:14: error: ‘PMD_SECT_RDONLY’ undeclared here (not in a function)

It seems that '80d6b0c2 ARM: mm: allow text and rodata sections to be read-only'
and 'ded94779 ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE'
commits crossed. 80d6b0c2 uses PMD_SECT_RDONLY macro but ded94779 renames it
and uses software bits L_PMD_SECT_RDONLY instead.

Fix is to use L_PMD_SECT_RDONLY instead PMD_SECT_RDONLY as ded94779 does in
another places.
Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

1e347922

08 1月, 2015 3 次提交

ARM: 8253/1: mm: use phys_addr_t type in map_lowmem() for kernel mem region · ac084688

由 Grygorii Strashko 提交于 12月 23, 2014

Now local variables kernel_x_start and kernel_x_end defined using
'unsigned long' type which is wrong because they represent physical
memory range and will be calculated wrongly if LPAE is enabled.
As result, all following code in map_lowmem() will not work correctly.

For example, Keystone 2 boot is broken because
 kernel_x_start == 0x0000 0000
 kernel_x_end   == 0x0080 0000

instead of
 kernel_x_start == 0x0000 0008 0000 0000
 kernel_x_end   == 0x0000 0008 0080 0000
and as result whole low memory will be mapped with MT_MEMORY_RW
permissions by code (start > kernel_x_end):
		} else if (start >= kernel_x_end) {
			map.pfn = __phys_to_pfn(start);
			map.virtual = __phys_to_virt(start);
			map.length = end - start;
			map.type = MT_MEMORY_RW;

			create_mapping(&map);
		}

Hence, fix it by using phys_addr_t type for variables kernel_x_start
and kernel_x_end.
Tested-by: NMurali Karicheri <m-karicheri2@ti.com>
Signed-off-by: NGrygorii Strashko <grygorii.strashko@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

ac084688

ARM: 8249/1: mm: dump: don't skip regions · cca547e9

由 Mark Rutland 提交于 12月 17, 2014

Currently the arm page table dumping code starts dumping page tables
from USER_PGTABLES_CEILING. This is unnecessary for skipping any entries
related to userspace as the swapper_pg_dir does not contain such
entries, and results in a couple of unfortuante side effects.

Firstly, any kernel mappings which might exist below
USER_PGTABLES_CEILING will not be accounted in the dump output. This
masks any entries erroneously created below this address.

Secondly, if the final page table entry walked is part of a valid
mapping the page table dumping code will not log the region this entry
is part of, as the final note_page call in walk_pgd will trigger an
early return when 0 < USER_PGTABLES_CEILING. Luckily this isn't seen on
contemporary systems as they typically don't have enough RAM to extend
the linear mapping right to the end of the address space.

Due to the way addr is constructed in the walk_* functions, it can never
be less than USER_PGTABLES_CEILING when walking the page tables, so it
is not necessary to avoid dereferencing invalid table addresses. The
existing checks for st->current_prot and st->marker[1].start_address are
sufficient to ensure we will not print and/or dereference garbage when
trying to log information.

This patch removes both problematic uses of USER_PGTABLES_CEILING from
the arm page table dumping code, preventing both of these issues. We
will now report any low mappings, and the final note_page call will not
return early, ensuring all regions are logged.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

cca547e9

R
ARM: wire up execveat syscall · 841ee230
由 Russell King 提交于 12月 18, 2014
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
841ee230

20 12月, 2014 15 次提交

CRISv32: Remove last remnants of ETRAX_SPI_MMC_BOARD · 8bcabff0

由 Jesper Nilsson 提交于 10月 29, 2014

There are no users of this symbol left.
Reported-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

8bcabff0

CRISv32: ETRAXFS: Fix recursive spinlock · a3199ad9

由 Jesper Nilsson 提交于 10月 27, 2014

Move pinmux alloc/dealloc code into functions that don't take
the spinlock so we can use from code that has the spinlock already.

CRISv32 has no working SMP, so spinlocks becomes a NOP,
so deadlock was never seen.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

a3199ad9

J
CRISv32: Select MTDRAM for axisflashmap · 9f1ad51b
由 Jesper Nilsson 提交于 10月 27, 2014
```
Fixes compile error on allmodconfig.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>
```
9f1ad51b
J
CRISv32: Implement early console · 4729d773
由 Jesper Nilsson 提交于 10月 08, 2014
```
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>
```
4729d773
J
CRIS: Use KALLSYMs if available in call stack dump · 421d0852
由 Jesper Nilsson 提交于 10月 08, 2014
```
Also, print kernel version on oops.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>
```
421d0852

CRISv32: Fix declaration mismatch · ca36c1fa

由 Jesper Nilsson 提交于 10月 07, 2014

Drop i2c_init from this header, it was declared non-static here,
but static in the C-file.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

ca36c1fa

CRISv32: Rewrite of synchronous serial port driver · 3f10462f

由 Jesper Nilsson 提交于 10月 07, 2014

Make driver possible to load as a module and try to handle
locking better.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

3f10462f

CRIS: Update init memory handling · 80d6170a

由 Jesper Nilsson 提交于 10月 07, 2014

- Add free_initrd_mem as found by Guenter Roeck <linux@roeck-us.net>
- Add free_init_pages
- Export empty_zero_page symbol
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

80d6170a

CRISv32: Better handling of watchdog bite · d6517c4c

由 Jesper Nilsson 提交于 10月 07, 2014

Don't enter watchdog handling if we're already in watchdog handling.

Also some minor formatting tweaks.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

d6517c4c

CRIS: Export missing function symbols · dbd3c7e1

由 Jesper Nilsson 提交于 10月 07, 2014

strcmp was lost when all other string functions were removed,
but we still have an optimized version for this on CRISv32,
so any driver built as a module would not have access to this symbol.

In a similar manner, we had optimized versions of
csum_partial_copy_from_user and __do_clear_user
but no exported symbols for them, breaking bunch of other drivers
when built as a module.

At the same time, move EXPORT_SYMBOL(__copy_user) and
EXPORT_SYMBOL(__copy_user_zeroing) C-files so it's
located together with the function definition.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

dbd3c7e1

J
CRIS: Export ioremap_nocache · 82e6df1e
由 Jesper Nilsson 提交于 10月 01, 2014
```
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>
```
82e6df1e

CRIS: Fix headers_install · 6eb64b8c

由 Sam Ravnborg 提交于 7月 14, 2014

Fix headers_install by adjusting the path to arch files.
And delete unused Kbuild file.
Drop special handling of cris in the headers.sh script
as a nice side-effect.
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

6eb64b8c

CRISv32: Add missing include for mm.h · eeda0084

由 Jesper Nilsson 提交于 9月 30, 2014

Fixes the following compile error.

arch/cris/arch-v32/kernel/time.c: In function 'reset_watchdog':
arch/cris/arch-v32/kernel/time.c:121:2:
        error: implicit declaration of function 'global_page_state'
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>

eeda0084

J
CRISv32: Drop obsolete file for SPI driver · f89412ac
由 Jesper Nilsson 提交于 9月 30, 2014
```
File was already deleted.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>
```
f89412ac

PM: Eliminate CONFIG_PM_RUNTIME · 464ed18e

由 Rafael J. Wysocki 提交于 12月 19, 2014

Having switched over all of the users of CONFIG_PM_RUNTIME to use
CONFIG_PM directly, turn the latter into a user-selectable option
and drop the former entirely from the tree.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NKevin Hilman <khilman@linaro.org>

464ed18e

19 12月, 2014 2 次提交

KVM: PPC: E500: Compile fix in this_cpu_write · 91ed9e8a

由 Alexander Graf 提交于 12月 18, 2014

Commit 69111bac ("powerpc: Replace __get_cpu_var uses") introduced
compile breakage to the e500 target by introducing invalid automatically
created C syntax.

Fix up the breakage and make the code compile again.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91ed9e8a

sparc32: destroy_context() and switch_mm() needs to disable interrupts. · 66d0f7ec

由 Andreas Larsson 提交于 12月 18, 2014

Load balancing can be triggered in the critical sections protected by
srmmu_context_spinlock in destroy_context() and switch_mm() and can hang
the cpu waiting for the rq lock of another cpu that in turn has called
switch_mm hangning on srmmu_context_spinlock leading to deadlock.

So, disable interrupt while taking srmmu_context_spinlock in
destroy_context() and switch_mm() so we don't deadlock.

See also commit 77b838fa ("[SPARC64]: destroy_context() needs to disable
interrupts.")
Signed-off-by: NAndreas Larsson <andreas@gaisler.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66d0f7ec

18 12月, 2014 12 次提交

x86/tls: Don't validate lm in set_thread_area() after all · 3fb2f423

由 Andy Lutomirski 提交于 12月 17, 2014

It turns out that there's a lurking ABI issue.  GCC, when
compiling this in a 32-bit program:

struct user_desc desc = {
	.entry_number    = idx,
	.base_addr       = base,
	.limit           = 0xfffff,
	.seg_32bit       = 1,
	.contents        = 0, /* Data, grow-up */
	.read_exec_only  = 0,
	.limit_in_pages  = 1,
	.seg_not_present = 0,
	.useable         = 0,
};

will leave .lm uninitialized.  This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.

Revert the .lm check in set_thread_area().  The value never did
anything in the first place.

Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # Only if 0e58af4e is backported
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

3fb2f423

powerpc/powernv: Ignore smt-enabled on Power8 and later · d70a54e2

由 Greg Kurz 提交于 12月 12, 2014

Starting with POWER8, the subcore logic relies on all threads of a core
being booted so that they can participate in split mode switches. So on
those machines we ignore the smt_enabled_at_boot setting (smt-enabled on
the kernel command line).
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
[mpe: Update comment and change log to be more precise]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d70a54e2

s390/kvm: REPLACE barrier fixup with READ_ONCE · 5de72a22

由 Christian Borntraeger 提交于 11月 25, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Commit 1365039d ("KVM: s390: Fix ipte locking") replace
ACCESS_ONCE with barriers. Lets use READ_ONCE instead.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5de72a22

arm/spinlock: Replace ACCESS_ONCE with READ_ONCE · 488beef1

由 Christian Borntraeger 提交于 11月 25, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

488beef1

arm64/spinlock: Replace ACCESS_ONCE READ_ONCE · af2e7aae

由 Christian Borntraeger 提交于 11月 24, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

af2e7aae

mips/gup: Replace ACCESS_ONCE with READ_ONCE · 4218091c

由 Christian Borntraeger 提交于 11月 21, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the gup code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4218091c

x86/gup: Replace ACCESS_ONCE with READ_ONCE · 14cf3d97

由 Christian Borntraeger 提交于 11月 21, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the gup code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

14cf3d97

x86/spinlock: Replace ACCESS_ONCE with READ_ONCE · 4f9d1382

由 Christian Borntraeger 提交于 11月 24, 2014

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4f9d1382

KVM: move APIC types to arch/x86/ · cb5281a5

由 Paolo Bonzini 提交于 12月 17, 2014

They are not used anymore by IA64, move them away.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cb5281a5

powerpc/uaccess: Allow get_user() with bitwise types · 505e4283

由 Michael S. Tsirkin 提交于 12月 14, 2014

At the moment, if p and x are both of the same bitwise type
(eg. __le32), get_user(x, p) produces a sparse warning.

This is because *p is loaded into a long then cast back to typeof(*p).

When typeof(*p) is a bitwise type (which is uncommon), such a cast needs
__force, otherwise sparse produces a warning.

For non-bitwise types __force should have no effect, and should not hide
any legitimate errors.

Note that we are casting to typeof(*p) not typeof(x). Even with the
cast, if x and *p are of different types we should get the warning, so I
think we are not loosing the ability to detect any actual errors.

virtio would like to use bitwise types with get_user() so fix these
spurious warnings by adding __force.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
[mpe: Fill in changelog with more details]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

505e4283

KVM: PPC: Book3S: Enable in-kernel XICS emulation by default · 476ce5ef

由 Anton Blanchard 提交于 12月 03, 2014

The in-kernel XICS emulation is faster than doing it all in QEMU
and it has got a lot of testing, so enable it by default.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

476ce5ef

x86: mm: fix VM_FAULT_RETRY handling · cf3c0a15

由 Linus Torvalds 提交于 12月 17, 2014

My commit 26178ec1 ("x86: mm: consolidate VM_FAULT_RETRY handling")
had a really stupid typo: the FAULT_FLAG_USER bit is in the 'flags'
variable, not the 'fault' variable. Duh,

The one silver lining in this is that Dave finding this at least
confirms that trinity actually triggers this special path easily, in a
way normal use does not.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf3c0a15

17 12月, 2014 2 次提交

KVM: PPC: Book3S HV: Improve H_CONFER implementation · 90fd09f8

由 Sam Bobroff 提交于 12月 03, 2014

Currently the H_CONFER hcall is implemented in kernel virtual mode,
meaning that whenever a guest thread does an H_CONFER, all the threads
in that virtual core have to exit the guest.  This is bad for
performance because it interrupts the other threads even if they
are doing useful work.

The H_CONFER hcall is called by a guest VCPU when it is spinning on a
spinlock and it detects that the spinlock is held by a guest VCPU that
is currently not running on a physical CPU.  The idea is to give this
VCPU's time slice to the holder VCPU so that it can make progress
towards releasing the lock.

To avoid having the other threads exit the guest unnecessarily,
we add a real-mode implementation of H_CONFER that checks whether
the other threads are doing anything.  If all the other threads
are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
it returns H_TOO_HARD which causes a guest exit and allows the
H_CONFER to be handled in virtual mode.

Otherwise it spins for a short time (up to 10 microseconds) to give
other threads the chance to observe that this thread is trying to
confer.  The spin loop also terminates when any thread exits the guest
or when all other threads are idle or trying to confer.  If the
timeout is reached, the H_CONFER returns H_SUCCESS.  In this case the
guest VCPU will recheck the spinlock word and most likely call
H_CONFER again.

This also improves the implementation of the H_CONFER virtual mode
handler.  If the VCPU is part of a virtual core (vcore) which is
runnable, there will be a 'runner' VCPU which has taken responsibility
for running the vcore.  In this case we yield to the runner VCPU
rather than the target VCPU.

We also introduce a check on the target VCPU's yield count: if it
differs from the yield count passed to H_CONFER, the target VCPU
has run since H_CONFER was called and may have already released
the lock.  This check is required by PAPR.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

90fd09f8

KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61

由 Paul Mackerras 提交于 12月 03, 2014

There are two ways in which a guest instruction can be obtained from
the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
instruction), the offending instruction is in the HEIR register
(Hypervisor Emulation Instruction Register).  If the exit was caused
by a load or store to an emulated MMIO device, we load the instruction
from the guest by turning data relocation on and loading the instruction
with an lwz instruction.

Unfortunately, in the case where the guest has opposite endianness to
the host, these two methods give results of different endianness, but
both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
using guest endianness, whereas the lwz will load the instruction using
host endianness.  The rest of the code that uses vcpu->arch.last_inst
assumes it was loaded using host endianness.

To fix this, we define a new vcpu field to store the HEIR value.  Then,
in kvmppc_handle_exit_hv(), we transfer the value from this new field to
vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
differ.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4a157d61