提交 · c37079586f317d7e7f1a70d36f0e5177691c89c2 · openeuler / Kernel

24 7月, 2011 2 次提交

KVM: MMU: remove bypass_guest_pf · c3707958

由 Xiao Guangrong 提交于 7月 12, 2011

The idea is from Avi:
| Maybe it's time to kill off bypass_guest_pf=1.  It's not as effective as
| it used to be, since unsync pages always use shadow_trap_nonpresent_pte,
| and since we convert between the two nonpresent_ptes during sync and unsync.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c3707958

KVM guest: KVM Steal time registration · d910f5c1

由 Glauber Costa 提交于 7月 11, 2011

This patch implements the kvm bits of the steal time infrastructure.
The most important part of it, is the steal time clock. It is an
continuous clock that shows the accumulated amount of steal time
since vcpu creation. It is supposed to survive cpu offlining/onlining.

[marcelo: fix build with CONFIG_KVM_GUEST=n]
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Avi Kivity <avi@redhat.com>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d910f5c1

12 7月, 2011 13 次提交

KVM: KVM Steal time guest/host interface · 9ddabbe7

由 Glauber Costa 提交于 7月 11, 2011

To implement steal time, we need the hypervisor to pass the guest information
about how much time was spent running other processes outside the VM.
This is per-vcpu, and using the kvmclock structure for that is an abuse
we decided not to make.

In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that
holds the memory area address containing information about steal time

This patch contains the headers for it. I am keeping it separate to facilitate
backports to people who wants to backport the kernel part but not the
hypervisor, or the other way around.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9ddabbe7

KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests · aa04b4cc

由 Paul Mackerras 提交于 6月 29, 2011

This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.

Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.

KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.

This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.

Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.

This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.

Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

aa04b4cc

KVM: PPC: Allow book3s_hv guests to use SMT processor modes · 371fefd6

由 Paul Mackerras 提交于 6月 29, 2011

This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.

This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.

To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.

When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.

We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.

When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).

It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.

Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

371fefd6

KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode · 54738c09

由 David Gibson 提交于 6月 29, 2011

This improves I/O performance for guests using the PAPR
paravirtualization interface by making the H_PUT_TCE hcall faster, by
implementing it in real mode.  H_PUT_TCE is used for updating virtual
IOMMU tables, and is used both for virtual I/O and for real I/O in the
PAPR interface.

Since this moves the IOMMU tables into the kernel, we define a new
KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables.  The
ioctl returns a file descriptor which can be used to mmap the newly
created table.  The qemu driver models use them in the same way as
userspace managed tables, but they can be updated directly by the
guest with a real-mode H_PUT_TCE implementation, reducing the number
of host/guest context switches during guest IO.

There are certain circumstances where it is useful for userland qemu
to write to the TCE table even if the kernel H_PUT_TCE path is used
most of the time.  Specifically, allowing this will avoid awkwardness
when we need to reset the table.  More importantly, we will in the
future need to write the table in order to restore its state after a
checkpoint resume or migration.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

54738c09

KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948

由 Paul Mackerras 提交于 6月 29, 2011

This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

de56a948

KVM: PPC: e500: enable magic page · a4cd8b23

由 Scott Wood 提交于 6月 14, 2011

This is a shared page used for paravirtualization.  It is always present
in the guest kernel's effective address space at the address indicated
by the hypercall that enables it.

The physical address specified by the hypercall is not used, as
e500 does not have real mode.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

a4cd8b23

KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 · 411c588d

由 Avi Kivity 提交于 6月 06, 2011

When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them).  Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.

Adjust for this by also setting NX on the spte in these circumstances.
Signed-off-by: NAvi Kivity <avi@redhat.com>

411c588d

KVM: Fix KVM_ASSIGN_SET_MSIX_ENTRY documentation · 58f0964e

由 Jan Kiszka 提交于 6月 11, 2011

The documented behavior did not match the implemented one (which also
never changed).
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

58f0964e

KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation · 91e3d71d

由 Jan Kiszka 提交于 6月 03, 2011

Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

91e3d71d

KVM: Fixup documentation section numbering · 7f4382e8

由 Jan Kiszka 提交于 6月 02, 2011

Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7f4382e8

KVM: Document KVM_IOEVENTFD · 55399a02

由 Sasha Levin 提交于 5月 28, 2011

Document KVM_IOEVENTFD that can be used to receive
notifications of PIO/MMIO events without triggering
an exit.
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

55399a02

KVM: nVMX: Documentation · 823e3965

由 Nadav Har'El 提交于 5月 25, 2011

This patch includes a brief introduction to the nested vmx feature in the
Documentation/kvm directory. The document also includes a copy of the
vmcs12 structure, as requested by Avi Kivity.

[marcelo: move to Documentation/virtual/kvm]
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

823e3965

A
KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctl · e7677933
由 Avi Kivity 提交于 5月 11, 2011
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
e7677933

07 7月, 2011 2 次提交

Documentation: fix cgroup blkio throttle filenames · 9b61fc4c

由 Andrea Righi 提交于 7月 06, 2011

All the blkio.throttle.* file names are incorrectly reported without
".throttle" in the documentation. Fix it.
Signed-off-by: NAndrea Righi <andrea@betterlinux.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b61fc4c

Documentation: update CodingStyle memory allocators · 316b3799

由 Jesper Juhl 提交于 7月 06, 2011

The list of available general purpose memory allocators in
Documentation/CodingStyle chapter 14 is incomplete. This patch adds
the missing vzalloc() to the list.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

316b3799

03 7月, 2011 2 次提交

hwmon: (k10temp) Update documentation for Fam12h · af75d5b7

由 Clemens Ladisch 提交于 7月 03, 2011

Add some CPU series IDs and links to the Fam12h datasheets.
Signed-off-by: NClemens Ladisch <clemens@ladisch.de>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

af75d5b7

hwmon: (f71882fg) Add support for the F71869A · 5da556e3

由 Hans de Goede 提交于 7月 03, 2011

The F71869A is almost the same as the F71869F/E, except that it has
the normal number of temp and pwm zones for a F71882FG derived chip,
rather then the limited number of the F71869F/E.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Tested-by: NMax Baldwin <archerseven@gmail.com>
Acked-by: NGuenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

5da556e3

02 7月, 2011 2 次提交

PM / Runtime: Update documentation regarding driver removal · f5da24db

由 Rafael J. Wysocki 提交于 7月 02, 2011

Commit e1866b33 (PM / Runtime: Rework
runtime PM handling during driver removal) forgot to update the
documentation in Documentation/power/runtime_pm.txt to match the new
code in drivers/base/dd.c.  Update that documentation to match the
code it describes.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: NKevin Hilman <khilman@ti.com>

f5da24db

PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist. · 5efb54cc

由 Kevin Hilman 提交于 6月 30, 2011

Replace reference to pm_runtime_idle_sync() in the driver core with
pm_runtime_put_sync() which is used in the code.
Signed-off-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

5efb54cc

22 6月, 2011 3 次提交

PM / Domains: Update documentation · ca9c6890

由 Rafael J. Wysocki 提交于 6月 21, 2011

Commit 4d27e9dc (PM: Make power
domain callbacks take precedence over subsystem ones) forgot to
update the device power management documentation to take changes
made by it into account.  Correct that mistake.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

ca9c6890

PM: Update documentation regarding sysdevs · 78420884

由 Rafael J. Wysocki 提交于 6月 18, 2011

The part of Documentation/power/devices.txt regarding sysdevs is not
valid any more after commit 2e711c04
(PM: Remove sysdev suspend, resume and shutdown operations), so
remove it.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

78420884

PM / Runtime: Update doc: usage count no longer incremented across system PM · 129b656a

由 Kevin Hilman 提交于 6月 10, 2011

Commit e8665002 (PM: Allow
pm_runtime_suspend() to succeed during system suspend) removed usage
count increment across system PM.

Update doc to reflect this.
Signed-off-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

129b656a

18 6月, 2011 1 次提交

USB: Fix up URB error codes to reflect implementation. · a9e75863

由 Sarah Sharp 提交于 6月 16, 2011

Documentation/usb/error-codes.txt mentions that urb->status can be set to
-EXDEV, if the isochronous transfer was not fully completed.  However, in
practice, EHCI, UHCI, and OHCI all only set -EXDEV in the individual frame
status, never in the URB status.  Those host controller actually always
pass in a zero status to usb_hcd_giveback_urb, and rely on the core to set
the appropriate status value.

The xHCI driver ran into issues with the uvcvideo driver when it tried to
set -EXDEV in urb->status, because the driver refused to submit URBs, and
the userspace camera application's video froze.

Clean up the documentation to reflect the actual implementation.
Signed-off-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>

a9e75863

16 6月, 2011 7 次提交

Documentation: fix cgroup typos and formatting · 67de0162

由 Jörg Sommer 提交于 6月 15, 2011

Fix format and spelling.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67de0162

Documentation: update cgroupfs mount point · f6e07d38

由 Jörg Sommer 提交于 6月 15, 2011

According to commit 676db4af ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup.  Hence, this should be used in the documentation.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6e07d38

Documentation: update kmemleak supported archs · 06a2c45d

由 Maxin B. John 提交于 6月 15, 2011

Instead of listing the architectures that are supported by
kmemleak in Documentation/kmemleak.txt, just refer people to
the list of supported architecutures in lib/Kconfig.debug so
that Documentation/kmemleak.txt does not need more updates
for this.
Signed-off-by: NMaxin B. John <maxin.john@gmail.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

06a2c45d

Documentation: update printk-formats.txt · 04c55715

由 Andrew Murray 提交于 6月 15, 2011

This patch updates the incomplete documentation concerning the printk
extended format specifiers.
Signed-off-by: NAndrew Murray <amurray@mpc-data.co.uk>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04c55715

Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt · 31b5f8ee

由 akpm@linux-foundation.org 提交于 6月 15, 2011

Commit a77aea92 ("cgroup: remove the ns_cgroup") removed the
ns_cgroup but it forgot to remove the related doc in
feature-removal-schedule.txt.
Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Serge E.  Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31b5f8ee

memcg: add documentation for the memory.numastat API · 50c35e5b

由 Ying Han 提交于 6月 15, 2011

[akpm@linux-foundation.org: rework text, fit it into 80-cols]
Signed-off-by: NYing Han <yinghan@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50c35e5b

backlight: new driver for the ADP8870 backlight devices · a59ec1e7

由 Michael Hennerich 提交于 6月 15, 2011

Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com>
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a59ec1e7

15 6月, 2011 1 次提交

rcu: Use softirq to address performance regression · 09223371

由 Shaohua Li 提交于 6月 14, 2011

Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread. A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler. Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling. (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime. And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work. RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case. This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Tested-by: N"Alex,Shi" <alex.shi@intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

09223371

09 6月, 2011 1 次提交

md:Documentation/md.txt - fix typo · f699bf23

由 NeilBrown 提交于 6月 09, 2011

Reported-by: NCoolCold <coolthecold@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f699bf23

08 6月, 2011 1 次提交

usb-storage: redo incorrect reads · 21c13a4f

由 Alan Stern 提交于 6月 07, 2011

Some USB mass-storage devices have bugs that cause them not to handle
the first READ(10) command they receive correctly.  The Corsair
Padlock v2 returns completely bogus data for its first read (possibly
it returns the data in encrypted form even though the device is
supposed to be unlocked).  The Feiya SD/SDHC card reader fails to
complete the first READ(10) command after it is plugged in or after a
new card is inserted, returning a status code that indicates it thinks
the command was invalid, which prevents the kernel from retrying the
read.

Since the first read of a new device or a new medium is for the
partition sector, the kernel is unable to retrieve the device's
partition table.  Users have to manually issue an "hdparm -z" or
"blockdev --rereadpt" command before they can access the device.

This patch (as1470) works around the problem.  It adds a new quirk
flag, US_FL_INVALID_READ10, indicating that the first READ(10) should
always be retried immediately, as should any failing READ(10) commands
(provided the preceding READ(10) command succeeded, to avoid getting
stuck in a loop).  The patch also adds appropriate unusual_devs
entries containing the new flag.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Tested-by: NSven Geggus <sven-usbst@geggus.net>
Tested-by: NPaul Hartman <paul.hartman+linux@gmail.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

21c13a4f

01 6月, 2011 1 次提交

intel-iommu: Enable super page (2MiB, 1GiB, etc.) support · 6dd9a7c7

由 Youquan Song 提交于 5月 25, 2011

There are no externally-visible changes with this. In the loop in the
internal __domain_mapping() function, we simply detect if we are mapping:
  - size >= 2MiB, and
  - virtual address aligned to 2MiB, and
  - physical address aligned to 2MiB, and
  - on hardware that supports superpages.

(and likewise for larger superpages).

We automatically use a superpage for such mappings. We never have to
worry about *breaking* superpages, since we trust that we will always
*unmap* the same range that was mapped. So all we need to do is ensure
that dma_pte_clear_range() will also cope with superpages.

Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
it can return a PTE at the appropriate level rather than always
extending the page tables all the way down to level 1. Again, this is
simplified by the fact that we should never encounter existing small
pages when we're creating a mapping; any old mapping that used the same
virtual range will have been entirely removed and its obsolete page
tables freed.

Provide an 'intel_iommu=sp_off' argument on the command line as a
chicken bit. Not that it should ever be required.

==

The original commit seen in the iommu-2.6.git was Youquan's
implementation (and completion) of my own half-baked code which I'd
typed into an email. Followed by half a dozen subsequent 'fixes'.

I've taken the unusual step of rewriting history and collapsing the
original commits in order to keep the main history simpler, and make
life easier for the people who are going to have to backport this to
older kernels. And also so I can give it a more coherent commit comment
which (hopefully) gives a better explanation of what's going on.

The original sequence of commits leading to identical code was:

Youquan Song (3):
      intel-iommu: super page support
      intel-iommu: Fix superpage alignment calculation error
      intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()

David Woodhouse (4):
      intel-iommu: Precalculate superpage support for dmar_domain
      intel-iommu: Fix hardware_largepage_caps()
      intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
      intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

6dd9a7c7

30 5月, 2011 2 次提交

lguest: remove support for VIRTIO_F_NOTIFY_ON_EMPTY. · 990c91f0

由 Rusty Russell 提交于 5月 30, 2011

No virtio device does this any more, so no need to clutter lguest with it.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

990c91f0

lguest: fix up compilation after move · bc805a03

由 Rusty Russell 提交于 5月 30, 2011

ed16648e "Move kvm, uml, and lguest
subdirectories" broke the lguest example launcher.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

bc805a03

29 5月, 2011 2 次提交

x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param · 5d4c47e0

由 Len Brown 提交于 4月 01, 2011

mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT on SMP hardware that supports it.

But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle().

Deprecate mwait_idle() and the "idle=mwait" cmdline param
to simplify the x86 idle code.

After this change, kernels configured with
(!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware
that support MWAIT will simply use HLT.  If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be used.

cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: NLen Brown <len.brown@intel.com>

5d4c47e0

x86 idle: deprecate "no-hlt" cmdline param · cdaab4a0

由 Len Brown 提交于 4月 01, 2011

We'd rather that modern machines not check if HLT works on
every entry into idle, for the benefit of machines that had
marginal electricals 15-years ago.  If those machines are still running
the upstream kernel, they can use "idle=poll".  The only difference
will be that they'll now invoke HLT in machine_hlt().

cc: x86@kernel.org # .39.x
Signed-off-by: NLen Brown <len.brown@intel.com>

cdaab4a0

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功