提交 · 7f9f44308c8993c9ab8078d174dad34bea3e82d7 · gsplhtlxg / clone-Linux

24 4月, 2015 5 次提交

x86: fix special __probe_kernel_write() tail zeroing case · d869844b

由 Linus Torvalds 提交于 4月 23, 2015

Commit cae2a173 ("x86: clean up/fix 'copy_in_user()' tail zeroing")
fixed the failure case tail zeroing of one special case of the x86-64
generic user-copy routine, namely when used for the user-to-user case
("copy_in_user()").

But in the process it broke an even more unusual case: using the user
copy routine for kernel-to-kernel copying.

Now, normally kernel-kernel copies are obviously done using memcpy(),
but we have a couple of special cases when we use the user-copy
functions. One is when we pass a kernel buffer to a regular user-buffer
routine, using set_fs(KERNEL_DS). That's a "normal" case, and continued
to work fine, because it never takes any faults (with the possible
exception of a silent and successful vmalloc fault).

But Jan Beulich pointed out another, very unusual, special case: when we
use the user-copy routines not because it's a path that expects a user
pointer, but for a couple of ftrace/kgdb cases that want to do a kernel
copy, but do so using "unsafe" buffers, and use the user-copy routine to
gracefully handle faults. IOW, for probe_kernel_write().

And that broke for the case of a faulting kernel destination, because we
saw the kernel destination and wanted to try to clear the tail of the
buffer. Which doesn't work, since that's what faults.

This only triggers for things like kgdb and ftrace users (eg trying
setting a breakpoint on read-only memory), but it's definitely a bug.
The fix is to not compare against the kernel address start (TASK_SIZE),
but instead use the same limits "access_ok()" uses.
Reported-and-tested-by: NJan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org # 4.0
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d869844b

nios2: rework cache · 1a70db49

由 Ley Foon Tan 提交于 4月 24, 2015

- flush dcache before flush instruction cache
- remork update_mmu_cache and flush_dcache_page
- add shmparam.h
Signed-off-by: NLey Foon Tan <lftan@altera.com>

1a70db49

nios2: Add types.h header required for __u32 type · 2009337e

由 Ezequiel Garcia 提交于 4月 24, 2015

Reported by the header checker (CONFIG_HEADERS_CHECK=y):

  CHECK   usr/include/asm/ (31 files)
./usr/include/asm/ptrace.h:77: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: NEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
Acked-by: NLey Foon Tan <lftan@altera.com>

2009337e

S
eth: bf609 eth clock: add pclk clock for stmmac driver probe · d91e14b3
由 Steven Miao 提交于 4月 24, 2015
```
Signed-off-by: NSteven Miao <realmz6@gmail.com>
```
d91e14b3

blackfin: Wire up missing syscalls · 4f650a59

由 Chen Gang 提交于 4月 13, 2015

The related syscalls are below which may cause samples/kdbus building
break in next-20150401 tree, the related information and error:

    CALL    scripts/checksyscalls.sh
  <stdin>:1223:2: warning: #warning syscall kcmp not implemented [-Wcpp]
  <stdin>:1226:2: warning: #warning syscall finit_module not implemented [-Wcpp]
  <stdin>:1229:2: warning: #warning syscall sched_setattr not implemented [-Wcpp]
  <stdin>:1232:2: warning: #warning syscall sched_getattr not implemented [-Wcpp]
  <stdin>:1235:2: warning: #warning syscall renameat2 not implemented [-Wcpp]
  <stdin>:1238:2: warning: #warning syscall seccomp not implemented [-Wcpp]
  <stdin>:1241:2: warning: #warning syscall getrandom not implemented [-Wcpp]
  <stdin>:1244:2: warning: #warning syscall memfd_create not implemented [-Wcpp]
  <stdin>:1247:2: warning: #warning syscall bpf not implemented [-Wcpp]
  <stdin>:1250:2: warning: #warning syscall execveat not implemented [-Wcpp]
  [...]
    HOSTCC  samples/kdbus/kdbus-workers
  samples/kdbus/kdbus-workers.c: In function ‘prime_new’:
  samples/kdbus/kdbus-workers.c:930:18: error: ‘__NR_memfd_create’ undeclared (first use in this function)
    p->fd = syscall(__NR_memfd_create, "prime-area", MFD_CLOEXEC);
                    ^
  samples/kdbus/kdbus-workers.c:930:18: note: each undeclared identifier is reported only once for each function it appears in
Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>

4f650a59

23 4月, 2015 17 次提交

arch: blackfin: kernel: kgdb: Remove unused function · b9061ef5

由 Rickard Strandqvist 提交于 1月 01, 2015

Remove the function kgdb_post_primary_code() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Acked-by: NSonic Zhang <sonic.zhang@analog.com>

b9061ef5

S
dma: fix build error after update to v3.19 · 37557178
由 Steven Miao 提交于 3月 06, 2015
```
Signed-off-by: NSteven Miao <realmz6@gmail.com>
```
37557178
S
blackfin: io: define __raw_readx/writex with bfin_readx/writex · 1a3372bc
由 Steven Miao 提交于 1月 20, 2015
```
Signed-off-by: NSteven Miao <realmz6@gmail.com>
```
1a3372bc

bf609: add resources for lcd nl8048 · b3df664b

由 Scott Jiang 提交于 12月 11, 2014

Signed-off-by: NScott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: NSteven Miao <realmz6@gmail.com>

b3df664b

pm: sometimes wake up from suspend to RAM would fail · ef7dcaf1

由 Aaron Wu 提交于 10月 22, 2014

Sometimes it fails to wake up from suspend to RAM, this is because
we would flush the data cache by assemble command FLUSHINV before
suspend to RAM, and there is a delay between this command execution
and cache flush completion. Add a 1uS delay to works around this.
Signed-off-by: NAaron Wu <Aaron.wu@analog.com>

ef7dcaf1

debug-mmrs: Eliminate all traces of the USB_PHY_TEST MMR · bb717b33

由 Andre Wolokita 提交于 9月 05, 2014

Interacting with the USB_PHY_TEST MMR through debugfs was causing wide-spread
chaos in the realm (kernel panic). Expunge all references to this demonic
register.
Signed-off-by: NAndre Wolokita <Andre.Wolokita@analog.com>

bb717b33

S
bf609: remove softswitch i2c configuration from adv7842 and adv7511 platform data · f7fee036
由 Sonic Zhang 提交于 8月 21, 2014
```
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
```
f7fee036
S
bf609: add platform data for soft switch devices on the video extenders · 199aad16
由 Sonic Zhang 提交于 8月 20, 2014
```
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
```
199aad16
S
bf609: enable soft switch gpio driver by default · 707e6f0b
由 Sonic Zhang 提交于 8月 20, 2014
```
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
```
707e6f0b
S
bf609: add gpio soft switch platform data for mcp23017 i2c devices · ea9b706b
由 Sonic Zhang 提交于 8月 20, 2014
```
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
```
ea9b706b

bf609: use new SND_BF6XX_PCM to choose audio pcm driver · 374feb1f

由 Scott Jiang 提交于 7月 31, 2014

There is a new bf6xx audio dma driver, so we don't reuse
bf5xx i2s pcm driver again.
Signed-off-by: NScott Jiang <scott.jiang.linux@gmail.com>

374feb1f

S
bug[220] kgdb: change the smp cross core function entry · a0f4207d
由 Sonic Zhang 提交于 7月 29, 2014
```
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
```
a0f4207d

arch: blackfin: kernel: setup.c: Cleaning up missing null-terminate in conjunction with strncpy · 4eb147c8

由 Rickard Strandqvist 提交于 7月 26, 2014

Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: NSteven Miao <realmz6@gmail.com>

4eb147c8

S
blackfin: defconfigs: cleanup unused CONFIG_MTD_CHAR, add MTD_SPI_NOR for BF537-STAMP · 2fcc440c
由 Steven Miao 提交于 7月 29, 2014
```
Signed-off-by: NSteven Miao <realmz6@gmail.com>
```
2fcc440c

powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled · 2e826695

由 Aneesh Kumar K.V 提交于 4月 21, 2015

This fix the below build error

arch/powerpc/mm/hash_utils_64.c: In function ‘flush_hash_hugepage’:
arch/powerpc/mm/hash_utils_64.c:1381:1: error: label at end of compound statement
 tm_abort:
 ^
make[1]: *** [arch/powerpc/mm/hash_utils_64.o] Error 1
Reported-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2e826695

frv: add io{read,write}{16,32}be functions · 04fca0e3

由 Guenter Roeck 提交于 4月 20, 2015

These functions are used in various drivers, including the latest
version of the 8250 driver. The latter causes the following build
failure.

drivers/tty/serial/8250/8250_core.c: In function 'mem32be_serial_out':
drivers/tty/serial/8250/8250_core.c:456:2: error:
			implicit declaration of function 'iowrite32be'
drivers/tty/serial/8250/8250_core.c: In function 'mem32be_serial_in':
drivers/tty/serial/8250/8250_core.c:462:2: error:
			implicit declaration of function 'ioread32be'

Cc: Kevin Cernekee <cernekee@gmail.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c627f2ce ("serial: 8250: Add support for big-endian MMIO
	accesses")
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NRob Herring <robh@kernel.org>

04fca0e3

mn10300: add io{read,write}{16,32}be functions · 601e3ad9

由 Guenter Roeck 提交于 4月 20, 2015

These functions are used in various drivers, including the latest
version of the 8250 driver. The latter causes the following build failure.

drivers/tty/serial/8250/8250_core.c: In function 'mem32be_serial_out':
drivers/tty/serial/8250/8250_core.c:456:2: error:
			implicit declaration of function 'iowrite32be'
drivers/tty/serial/8250/8250_core.c: In function 'mem32be_serial_in':
drivers/tty/serial/8250/8250_core.c:462:2: error:
			implicit declaration of function 'ioread32be'

Cc: Kevin Cernekee <cernekee@gmail.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c627f2ce ("serial: 8250: Add support for big-endian MMIO
	accesses")
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NRob Herring <robh@kernel.org>

601e3ad9

22 4月, 2015 8 次提交

KVM: arm/arm64: check IRQ number on userland injection · fd1d0ddf

由 Andre Przywara 提交于 4月 10, 2015

When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
only check it against a fixed limit, which historically is set
to 127. With the new dynamic IRQ allocation the effective limit may
actually be smaller (64).
So when now a malicious or buggy userland injects a SPI in that
range, we spill over on our VGIC bitmaps and bytemaps memory.
I could trigger a host kernel NULL pointer dereference with current
mainline by injecting some bogus IRQ number from a hacked kvmtool:
-----------------
....
DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
DEBUG: IRQ #114 still in the game, writing to bytemap now...
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc07652e000
[00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
Hardware name: FVP Base (DT)
task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
PC is at kvm_vgic_inject_irq+0x234/0x310
LR is at kvm_vgic_inject_irq+0x30c/0x310
pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
.....

So this patch fixes this by checking the SPI number against the
actual limit. Also we remove the former legacy hard limit of
127 in the ioctl code.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
[maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
as suggested by Christopher Covington]
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

fd1d0ddf

ARM: qcom: add description of KPSS WDT for IPQ8064 · 4ba1c98b

由 Mathieu Olivari 提交于 2月 20, 2015

Add the watchdog related entries to the Krait Processor Sub-system
(KPSS) timer IPQ8064 devicetree section. Also, add a fixed-clock
description of SLEEP_CLK, which will do for now.
Signed-off-by: NJosh Cartwright <joshc@codeaurora.org>
Signed-off-by: NMathieu Olivari <mathieu@codeaurora.org>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Acked-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

4ba1c98b

ia64/PCI: Treat all host bridge Address Space Descriptors (even consumers) as windows · 9fbbda5c

由 Bjorn Helgaas 提交于 4月 21, 2015

Prior to c770cb4c ("PCI: Mark invalid BARs as unassigned"), if we tried
to claim a PCI BAR but could not find an upstream bridge window that
matched it, we complained but still allowed the device to be enabled.

c770cb4c broke devices that previously worked (mptsas and igb in the
case Tony reported, but it could be any devices) because it marks those
BARs as IORESOURCE_UNSET, which makes pci_enable_device() complain and
return failure:

igb 0000:81:00.0: can't enable device: BAR 0 [mem size 0x00020000] not assigned
igb: probe of 0000:81:00.0 failed with error -22

The underlying cause is an ACPI Address Space Descriptor for a PCI host
bridge window that is marked as "consumer". This is a firmware defect:
resources that are produced on the downstream side of a bridge should be
marked "producer". But rejecting these BARs that we previously allowed is
a functionality regression, and firmware has not used the producer/consumer
bit consistently, so we can't rely on it anyway.

Stop checking the producer/consumer bit, and assume all bridge Address
Space Descriptors are for bridge windows.

Note that this change does not affect I/O Port or Fixed Location I/O Port
Descriptors, which are commonly used for the [io 0x0cf8-0x0cff] config
access range. That range is a "consumer" range and should not be treated
as a window.

Fixes: c770cb4c ("PCI: Mark invalid BARs as unassigned")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96961Reported-and-tested-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9fbbda5c

sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context · 0edfad59

由 Sowmini Varadhan 提交于 4月 21, 2015

Since it is possible for vnet_event_napi to end up doing
vnet_control_pkt_engine -> ... -> vnet_send_attr ->
vnet_port_alloc_tx_ring -> ldc_alloc_exp_dring -> kzalloc()
(i.e., in softirq context), kzalloc() should be called with
GFP_ATOMIC from ldc_alloc_exp_dring.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0edfad59

sparc64: Use M7 PMC write on all chips T4 and onward. · df386375

由 David S. Miller 提交于 4月 21, 2015

They both work equally well, and the M7 implementation is
simpler and cheaper (less register writes).

With help from David Ahern.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df386375

parisc: Replace PT_NLEVELS with CONFIG_PGTABLE_LEVELS · c19edb69

由 Guenter Roeck 提交于 4月 15, 2015

The following warning is seen when compiling parisc images

./arch/parisc/include/asm/pgalloc.h: In function 'pgd_alloc':
./arch/parisc/include/asm/pgalloc.h:29:5: warning: "PT_NLEVELS" is not defined

Some definitions of PT_NLEVELS were missed with the conversion to
CONFIG_PGTABLE_LEVELS.

Fixes: f24ffde4 ("parisc: expose number of page table levels
	on Kconfig level")
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NHelge Deller <deller@gmx.de>

c19edb69

parisc: Eliminate sg_virt_addr() and private scatterlist.h · 8bf8a1d1

由 Matthew Wilcox 提交于 3月 20, 2015

The only reason to keep parisc's private asm/scatterlist.h was that it
had the macro sg_virt_addr().  Convert all callers to use something else
(sometimes just sg->offset was enough, others should use sg_virt()), and
we can just use the asm-generic scatterlist.h instead.
Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
Signed-off-by: NDave Anglin <dave.anglin@bell.net>
Signed-off-by: NHelge Deller <deller@gmx.de>

8bf8a1d1

KVM: VMX: Preserve host CR4.MCE value while in guest mode. · 085e68ee

由 Ben Serebrin 提交于 4月 16, 2015

The host's decision to enable machine check exceptions should remain
in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
and passed a slightly-modified 0 to the vmcs.guest_cr4 value.

Tested: Built.
On earlier version, tested by injecting machine check
while a guest is spinning.

Before the change, if guest CR4.MCE==0, then the machine check is
escalated to Catastrophic Error (CATERR) and the machine dies.
If guest CR4.MCE==1, then the machine check causes VMEXIT and is
handled normally by host Linux. After the change, injecting a machine
check causes normal Linux machine check handling.
Signed-off-by: NBen Serebrin <serebrin@google.com>
Reviewed-by: NVenkatesh Srinivas <venkateshs@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

085e68ee

21 4月, 2015 10 次提交

ARM: 8344/1: VDSO: honor CONFIG_VDSO in Makefile · f80f6531

由 Nathan Lynch 提交于 4月 18, 2015

When CONFIG_VDSO=n, the build normally does not enter arch/arm/vdso/
because arch/arm/Makefile does not add it to core-y.

However, if the user runs 'make arch/arm/vdso/' the VDSO targets will
get visited.  This is because the VDSO Makefile itself does not
consider the value of CONFIG_VDSO.

It is arguably better and more consistent behavior to generate an
empty built-in.o when CONFIG_VDSO=n and the user attempts to build
arch/arm/vdso/.  It's nicer because it doesn't try to build things
that Kconfig dependencies are there to prevent (e.g. the dependency on
AEABI), and it's less confusing than building objects that won't be
used in the final image.
Signed-off-by: NNathan Lynch <nathan_lynch@mentor.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

f80f6531

ARM: 8343/1: VDSO: add build artifacts to .gitignore · 2b507a2d

由 Nathan Lynch 提交于 4月 18, 2015

vdsomunge and vdso.so.raw are outputs that don't get matched by the
normal ignore rules.
Signed-off-by: NNathan Lynch <nathan_lynch@mentor.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

2b507a2d

ARM: Fix nommu booting · 0a9024e8

由 Russell King 提交于 4月 19, 2015

Commit bf35706f ("ARM: 8314/1: replace PROCINFO embedded branch with
relative offset") broke booting on nommu platforms as it didn't update
the nommu boot code. This patch fixes that oversight.

Fixes: bf35706f ("ARM: 8314/1: replace PROCINFO embedded branch with relative offset")
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

0a9024e8

KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61

由 Paul Mackerras 提交于 3月 28, 2015

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
  rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr:     1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry:     3060.1ns (212 - 65138, 953873 samples)
  rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
  rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

66feed61

KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C · eddb60fb

由 Paul Mackerras 提交于 3月 28, 2015

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c. It also moves the IPI sending code that was
in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

eddb60fb

KVM: PPC: Book3S HV: Streamline guest entry and exit · 6af27c84

由 Paul Mackerras 提交于 3月 28, 2015

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry:     avg 4537.3ns (222 - 48444, 1068878 samples)
  rm_exit:     avg 4787.6ns (152 - 165490, 1010717 samples)
  rm_intr:     avg 1673.6ns (12 - 341304, 3818691 samples)

and this after the change:

 rm_entry:     avg 3427.7ns (232 - 68150, 1118921 samples)
  rm_exit:     avg 4716.0ns (12 - 150720, 1119477 samples)
  rm_intr:     avg 1614.8ns (12 - 522436, 3850432 samples)

showing a substantial reduction in the time spent per guest entry in
the real-mode guest entry code, and smaller reductions in the real
mode guest exit and interrupt handling times.  (The test was to start
the guest and boot Fedora 20 big-endian to the login prompt.)
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6af27c84

KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da

由 Paul Mackerras 提交于 3月 28, 2015

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each. The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7d6c40da

KVM: PPC: Book3S HV: Use decrementer to wake napping threads · fd6d53b1

由 Paul Mackerras 提交于 3月 28, 2015

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

fd6d53b1

KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI · ccc07772

由 Paul Mackerras 提交于 3月 28, 2015

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ccc07772

KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd

由 Paul Mackerras 提交于 3月 28, 2015

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5d5b99cd