提交 · e30d9859cf08920ae711f57ecd9726804451d29f · openanolis / cloud-kernel

01 11月, 2010 1 次提交

i2c-i801: Add Intel Patsburg device ID · e30d9859

由 Seth Heasley 提交于 10月 31, 2010

Add support for the Intel Patsburg PCH SMBus Controller.
Signed-off-by: NSeth Heasley <seth.heasley@intel.com>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

e30d9859

30 10月, 2010 1 次提交

kdb: Add kdb kernel module sample · 4aad8f51

由 Jason Wessel 提交于 10月 25, 2010

Add an example of how to add a dynamic kdb shell command via a kernel
module.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

4aad8f51

29 10月, 2010 9 次提交

hwmon: (it87) Add support for the IT8721F/IT8758E · 44c1bcd4

由 Jean Delvare 提交于 10月 28, 2010

Add support for the IT8721F/IT8758E. These new chips differ from the
older IT87xxF chips in the following ways:
* ADC LSB is 12 mV instead of 16 mV.
* PWM values are 8-bit instead of 7-bit.
There are other minor changes we don't have to care about in the
driver.

Another change is that we will handle internal voltage scaling in the
driver instead of delegating the work to user-space.
Signed-off-by: NJean Delvare <khali@linux-fr.org>

44c1bcd4

hwmon: (lm85) Add support for ADT7468 high-frequency PWM mode · f6c61cff

由 Jean Delvare 提交于 10月 28, 2010

The ADT7468 supports a high-frequency PWM output mode where all PWM
outputs are driven by a 22.5 kHz clock. Add support for this mode, and
document it, as it may surprise the user that setting one PWM output
frequency also affects the other PWM outputs.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: NGuenter Roeck <guenter.roeck@ericsson.com>

f6c61cff

hwmon: (lm85) Document the ADT7468 as supported · c36364db

由 Jean Delvare 提交于 10月 28, 2010

Signed-off-by: NJean Delvare <khali@linux-fr.org>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: NGuenter Roeck <guenter.roeck@ericsson.com>

c36364db

hwmon: (pcf8591) Don't attempt to detect devices · 6dfee853

由 Jean Delvare 提交于 10月 28, 2010

The PCF8591 can't be detected, don't even try. There are plenty of
other means to instantiate i2c devices these days.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Reviewed-by: NGuenter Roeck <guenter.roeck@ericsson.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>

6dfee853

hwmon: (lm90) Add support for the W83L771W/G · c4f99a2b

由 Jean Delvare 提交于 10月 28, 2010

I was wondering if that chip ever existed publicly... Apparently yes,
so add support for it.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Tested-by: NAlexander Stein <alexander.stein@informatik.tu-chemnitz.de>
Acked-by: NGuenter Roeck <guenter.roeck@ericsson.com>

c4f99a2b

hwmon: (lm90) Add support for max6695 and max6696 · 06e1c0a2

由 Guenter Roeck 提交于 10月 28, 2010

Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

06e1c0a2

G
hwmon: (lm90) Add support for extra features of max6659 · 6948708d
由 Guenter Roeck 提交于 10月 28, 2010
```
Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: NJean Delvare <khali@linux-fr.org>
```
6948708d

hwmon: (lm90) Add explicit support for max6659 · 13c84951

由 Guenter Roeck 提交于 10月 28, 2010

Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

13c84951

hwmon: Add tempX_emergency attribute to sysfs ABI · 28e7438f

由 Guenter Roeck 提交于 10月 28, 2010

Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

28e7438f

28 10月, 2010 9 次提交

fs/9p: Add access = client option to opt in acl evaluation. · 76381a42

由 Aneesh Kumar K.V 提交于 9月 28, 2010

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

76381a42

ext4: add support for lazy inode table initialization · bfff6873

由 Lukas Czerner 提交于 10月 27, 2010

When the lazy_itable_init extended option is passed to mke2fs, it
considerably speeds up filesystem creation because inode tables are
not zeroed out.  The fact that parts of the inode table are
uninitialized is not a problem so long as the block group descriptors,
which contain information regarding how much of the inode table has
been initialized, has not been corrupted However, if the block group
checksums are not valid, e2fsck must scan the entire inode table, and
the the old, uninitialized data could potentially cause e2fsck to
report false problems.

Hence, it is important for the inode tables to be initialized as soon
as possble.  This commit adds this feature so that mke2fs can safely
use the lazy inode table initialization feature to speed up formatting
file systems.

This is done via a new new kernel thread called ext4lazyinit, which is
created on demand and destroyed, when it is no longer needed.  There
is only one thread for all ext4 filesystems in the system. When the
first filesystem with inititable mount option is mounted, ext4lazyinit
thread is created, then the filesystem can register its request in the
request list.

This thread then walks through the list of requests picking up
scheduled requests and invoking ext4_init_inode_table(). Next schedule
time for the request is computed by multiplying the time it took to
zero out last inode table with wait multiplier, which can be set with
the (init_itable=n) mount option (default is 10).  We are doing
this so we do not take the whole I/O bandwidth. When the thread is no
longer necessary (request list is empty) it frees the appropriate
structures and exits (and can be created later later by another
filesystem).

We do not disturb regular inode allocations in any way, it just do not
care whether the inode table is, or is not zeroed. But when zeroing, we
have to skip used inodes, obviously. Also we should prevent new inode
allocations from the group, while zeroing is on the way. For that we
take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem
in the ext4_claim_inode, so when we are unlucky and allocator hits the
group which is currently being zeroed, it just has to wait.

This can be suppresed using the mount option no_init_itable.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bfff6873

delay-accounting: reimplement -c for getdelays.c to report information on a target command · db9e5679

由 Mel Gorman 提交于 10月 27, 2010

Task delay-accounting was identified as one means of determining how long
a process spends in congestion_wait() without adding new statistics.  For
example, if the workload should not be doing IO, delay-accounting could
reveal how long it was spending in unexpected IO or delays.
Unfortunately, on closer examination it was clear that getdelays does not
act as documented.

Commit a3baf649 ("per-task-delay-accounting: documentation") added
Documentation/accounting/getdelays.c with a -c switch that was documented
to fork/exec a child and report statistics on it but for reasons that are
unclear to me, commit 9e06d3f9 deleted support for this switch but did not
update the documentation.  It might be an oversight or it might be because
the control flow of the program meant that accounting information would be
printed once early in the lifetime of the program making it of limited
use.

This patch reimplements -c for getdelays.c to act as documented.  Unlike
the original version, it waits until the command completes before printing
any information on it.  An example of it being used looks like

$ ./getdelays -d -c find /home/mel -name mel
print delayacct stats ON
/home/mel
/home/mel/.notes-wine/drive_c/windows/profiles/mel
/home/mel/.wine/drive_c/windows/profiles/mel
/home/mel/git-configs/dot.kde/share/apps/konqueror/home/mel
PID	5923

CPU             count     real total  virtual total    delay total
                42779     5051232096     5164722692      564207988
IO              count    delay total
                41727    97804147758
SWAP            count    delay total
                    0              0
RECLAIM         count    delay total
                    0              0

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NBalbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db9e5679

/proc/pid/pagemap: document in Documentation/filesystems/proc.txt · 03f890f8

由 Nikanth Karthikesan 提交于 10月 27, 2010

Document /proc/pid/pagemap in Documentation/filesystems/proc.txt
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Cc: Richard Guenther <rguenther@suse.de>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

03f890f8

/proc/pid/smaps: export amount of anonymous memory in a mapping · b40d4f84

由 Nikanth Karthikesan 提交于 10月 27, 2010

Export the number of anonymous pages in a mapping via smaps.

Even the private pages in a mapping backed by a file, would be marked as
anonymous, when they are modified. Export this information to user-space via
smaps.

Exporting this count will help gdb to make a better decision on which
areas need to be dumped in its coredump; and should be useful to others
studying the memory usage of a process.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Acked-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b40d4f84

cgroup: notify ns_cgroup deprecated · 45531757

由 Daniel Lezcano 提交于 10月 27, 2010

The ns_cgroup will be removed very soon.  Let's warn, for this version,
ns_cgroup is deprecated.

Make ns_cgroup and clone_children exclusive.  If the clone_children is set
and the ns_cgroup is mounted, let's fail with EINVAL when the ns_cgroup
subsys is created (a printk will help the user to understand why the
creation fails).

Update the feature remove schedule file with the deprecated ns_cgroup.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45531757

cgroup: add clone_children control file · 97978e6d

由 Daniel Lezcano 提交于 10月 27, 2010

The ns_cgroup is a control group interacting with the namespaces.  When a
new namespace is created, a corresponding cgroup is automatically created
too.  The cgroup name is the pid of the process who did 'unshare' or the
child of 'clone'.

This cgroup is tied with the namespace because it prevents a process to
escape the control group and use the post_clone callback, so the child
cgroup inherits the values of the parent cgroup.

Unfortunately, the more we use this cgroup and the more we are facing
problems with it:

(1) when a process unshares, the cgroup name may conflict with a
    previous cgroup with the same pid, so unshare or clone return -EEXIST

(2) the cgroup creation is out of control because there may have an
    application creating several namespaces where the system will
    automatically create several cgroups in his back and let them on the
    cgroupfs (eg.  a vrf based on the network namespace).

(3) the mix of (1) and (2) force an administrator to regularly check
    and clean these cgroups.

This patchset removes the ns_cgroup by adding a new flag to the cgroup and
the cgroupfs mount option.  It enables the copy of the parent cgroup when
a child cgroup is created.  We can then safely remove the ns_cgroup as
this flag brings a compatibility.  We have now to manually create and add
the task to a cgroup, which is consistent with the cgroup framework.

This patch:

Sent as an answer to a previous thread around the ns_cgroup.

https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

It adds a control file 'clone_children' for a cgroup.  This control file
is a boolean specifying if the child cgroup should be a clone of the
parent cgroup or not.  The default value is 'false'.

This flag makes the child cgroup to call the post_clone callback of all
the subsystem, if it is available.

At present, the cpuset is the only one which had implemented the
post_clone callback.

The option can be set at mount time by specifying the 'clone_children'
mount option.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: NPaul Menage <menage@google.com>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Cc: Matt Helsley <matthltc@us.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97978e6d

doc: clarify the behaviour of dirty_ratio/dirty_bytes · abffc020

由 Andrea Righi 提交于 10月 27, 2010

When dirty_ratio or dirty_bytes is written the other parameter is disabled
and set to 0 (in dirty_bytes_handler() / dirty_ratio_handler()).

We do the same for dirty_background_ratio and dirty_background_bytes.

However, in the sysctl documentation, we say that the counterpart becomes
a function of the old value, that is not correct.

Clarify the documentation reporting the actual behaviour.
Reviewed-by: NGreg Thelen <gthelen@google.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrea Righi <arighi@develer.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

abffc020

Coccinelle: Fix documentation · 9dcf7990

由 Nicolas Palix 提交于 10月 24, 2010

A file used as example has been moved elsewhere.
Update the documentation accordingly
Signed-off-by: NNicolas Palix <npalix.work@gmail.com>
Reported-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NMichal Marek <mmarek@suse.cz>

9dcf7990

27 10月, 2010 10 次提交

docbook: add idr/ida to kernel-api docbook · 56083ab1

由 Randy Dunlap 提交于 10月 26, 2010

Add idr/ida to kernel-api docbook.
Fix typos and kernel-doc notation.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Naohiro Aota <naota@elisp.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

56083ab1

docbook: add more wait/wake/completion to device-drivers docbook · ee2f154a

由 Randy Dunlap 提交于 10月 26, 2010

Add more wait, wake, and completion interfaces to the device-drivers
docbook.

Fix kernel-doc notation in the added files.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee2f154a

documentation: update sysrq.txt magic sysrq keys · 003bb8ab

由 Randy Dunlap 提交于 10月 26, 2010

Update Documentation/sysrq.txt magic sysrq keys:

 - 'g' is for kgdb (not arch-specific);
 - add 2 new uses for 'v', remove the Voyager info;
 - add 'y' info (SPARC-64 specific);
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <James.Bottomley@suse.de>
Cc: David Airlie <airlied@linux.ie>
Acked-by: NAlexander Shishkin <virtuoso@slind.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

003bb8ab

Documentation: short descriptions for bh1770glc and apds990x drivers · 3f0f4a3f

由 Samu Onkalo 提交于 10月 26, 2010

Add short documentation for two ALS / proximity chip drivers.
Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com>
Acked-by: NJonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3f0f4a3f

Documentation/timers/hpet_example.c: add supporting info for hpet_example · aaaddfe0

由 Jaswinder Singh Rajput 提交于 10月 26, 2010

$./hpet_example info /dev/hpet
-hpet: executing info
hpet_info: hi_irqfreq 0x0 hi_flags 0x0 hi_hpet 0 hi_timer 2
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: "Venkatesh Pallipadi (Venki)" <venki@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aaaddfe0

mm: highmem documentation · d65bfacb

由 Peter Zijlstra 提交于 10月 26, 2010

Document outlining some of the highmem issues, started by me, edited by
David.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NChris Metcalf <cmetcalf@tilera.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d65bfacb

tracing, vmscan: add trace events for LRU list shrinking · e11da5b4

由 Mel Gorman 提交于 10月 26, 2010

There have been numerous reports of stalls that pointed at the problem
being somewhere in the VM. There are multiple roots to the problems which
means dealing with any of the root problems in isolation is tricky to
justify on their own and they would still need integration testing. This
patch series puts together two different patch sets which in combination
should tackle some of the root causes of latency problems being reported.

Patch 1 adds a tracepoint for shrink_inactive_list. For this series, the
most important results is being able to calculate the scanning/reclaim
ratio as a measure of the amount of work being done by page reclaim.

Patch 2 accounts for time spent in congestion_wait.

Patches 3-6 were originally developed by Kosaki Motohiro but reworked for
this series. It has been noted that lumpy reclaim is far too aggressive
and trashes the system somewhat. As SLUB uses high-order allocations, a
large cost incurred by lumpy reclaim will be noticeable. It was also
reported during transparent hugepage support testing that lumpy reclaim
was trashing the system and these patches should mitigate that problem
without disabling lumpy reclaim.

Patch 7 adds wait_iff_congested() and replaces some callers of
congestion_wait(). wait_iff_congested() only sleeps if there is a BDI
that is currently congested. Patch 8 notes that any BDI being congested
is not necessarily a problem because there could be multiple BDIs of
varying speeds and numberous zones. It attempts to track when a zone
being reclaimed contains many pages backed by a congested BDI and if so,
reclaimers wait on the congestion queue.

I ran a number of tests with monitoring on X86, X86-64 and PPC64. Each
machine had 3G of RAM and the CPUs were

X86: Intel P4 2-core
X86-64: AMD Phenom 4-core
PPC64: PPC970MP

Each used a single disk and the onboard IO controller. Dirty ratio was
left at 20. I'm just going to report for X86-64 and PPC64 in a vague
attempt to keep this report short. Four kernels were tested each based on
v2.6.36-rc4

traceonly-v2r2: Patches 1 and 2 to instrument vmscan reclaims and congestion_wait
lowlumpy-v2r3: Patches 1-6 to test if lumpy reclaim is better
waitcongest-v2r3: Patches 1-7 to only wait on congestion
waitwriteback-v2r4: Patches 1-8 to detect when a zone is congested

nocongest-v1r5: Patches 1-3 for testing wait_iff_congestion
nodirect-v1r5: Patches 1-10 to disable filesystem writeback for better IO

The tests run were as follows

kernbench
compile-based benchmark. Smoke test performance

sysbench
OLTP read-only benchmark. Will be re-run in the future as read-write

micro-mapped-file-stream
This is a micro-benchmark from Johannes Weiner that accesses a
large sparse-file through mmap(). It was configured to run in only
single-CPU mode but can be indicative of how well page reclaim
identifies suitable pages.

stress-highalloc
Tries to allocate huge pages under heavy load.

kernbench, iozone and sysbench did not report any performance regression
on any machine. sysbench did pressure the system lightly and there was
reclaim activity but there were no difference of major interest between
the kernels.

X86-64 micro-mapped-file-stream

traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4
pgalloc_dma 1639.00 ( 0.00%) 667.00 (-145.73%) 1167.00 ( -40.45%) 578.00 (-183.56%)
pgalloc_dma32 2842410.00 ( 0.00%) 2842626.00 ( 0.01%) 2843043.00 ( 0.02%) 2843014.00 ( 0.02%)
pgalloc_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pgsteal_dma 729.00 ( 0.00%) 85.00 (-757.65%) 609.00 ( -19.70%) 125.00 (-483.20%)
pgsteal_dma32 2338721.00 ( 0.00%) 2447354.00 ( 4.44%) 2429536.00 ( 3.74%) 2436772.00 ( 4.02%)
pgsteal_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pgscan_kswapd_dma 1469.00 ( 0.00%) 532.00 (-176.13%) 1078.00 ( -36.27%) 220.00 (-567.73%)
pgscan_kswapd_dma32 4597713.00 ( 0.00%) 4503597.00 ( -2.09%) 4295673.00 ( -7.03%) 3891686.00 ( -18.14%)
pgscan_kswapd_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pgscan_direct_dma 71.00 ( 0.00%) 134.00 ( 47.01%) 243.00 ( 70.78%) 352.00 ( 79.83%)
pgscan_direct_dma32 305820.00 ( 0.00%) 280204.00 ( -9.14%) 600518.00 ( 49.07%) 957485.00 ( 68.06%)
pgscan_direct_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pageoutrun 16296.00 ( 0.00%) 21254.00 ( 23.33%) 18447.00 ( 11.66%) 20067.00 ( 18.79%)
allocstall 443.00 ( 0.00%) 273.00 ( -62.27%) 513.00 ( 13.65%) 1568.00 ( 71.75%)

These are based on the raw figures taken from /proc/vmstat. It's a rough
measure of reclaim activity. Note that allocstall counts are higher
because we are entering direct reclaim more often as a result of not
sleeping in congestion. In itself, it's not necessarily a bad thing.
It's easier to get a view of what happened from the vmscan tracepoint
report.

FTrace Reclaim Statistics: vmscan

traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4
Direct reclaims 443 273 513 1568
Direct reclaim pages scanned 305968 280402 600825 957933
Direct reclaim pages reclaimed 43503 19005 30327 117191
Direct reclaim write file async I/O 0 0 0 0
Direct reclaim write anon async I/O 0 3 4 12
Direct reclaim write file sync I/O 0 0 0 0
Direct reclaim write anon sync I/O 0 0 0 0
Wake kswapd requests 187649 132338 191695 267701
Kswapd wakeups 3 1 4 1
Kswapd pages scanned 4599269 4454162 4296815 3891906
Kswapd pages reclaimed 2295947 2428434 2399818 2319706
Kswapd reclaim write file async I/O 1 0 1 1
Kswapd reclaim write anon async I/O 59 187 41 222
Kswapd reclaim write file sync I/O 0 0 0 0
Kswapd reclaim write anon sync I/O 0 0 0 0
Time stalled direct reclaim (seconds) 4.34 2.52 6.63 2.96
Time kswapd awake (seconds) 11.15 10.25 11.01 10.19

Total pages scanned 4905237 4734564 4897640 4849839
Total pages reclaimed 2339450 2447439 2430145 2436897
%age total pages scanned/reclaimed 47.69% 51.69% 49.62% 50.25%
%age total pages scanned/written 0.00% 0.00% 0.00% 0.00%
%age file pages scanned/written 0.00% 0.00% 0.00% 0.00%
Percentage Time Spent Direct Reclaim 29.23% 19.02% 38.48% 20.25%
Percentage Time kswapd Awake 78.58% 78.85% 76.83% 79.86%

What is interesting here for nocongest in particular is that while direct
reclaim scans more pages, the overall number of pages scanned remains the
same and the ratio of pages scanned to pages reclaimed is more or less the
same. In other words, while we are sleeping less, reclaim is not doing
more work and as direct reclaim and kswapd is awake for less time, it
would appear to be doing less work.

FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 87 196 64 0
Direct time congest waited 4604ms 4732ms 5420ms 0ms
Direct full congest waited 72 145 53 0
Direct number conditional waited 0 0 324 1315
Direct time conditional waited 0ms 0ms 0ms 0ms
Direct full conditional waited 0 0 0 0
KSwapd number congest waited 20 10 15 7
KSwapd time congest waited 1264ms 536ms 884ms 284ms
KSwapd full congest waited 10 4 6 2
KSwapd number conditional waited 0 0 0 0
KSwapd time conditional waited 0ms 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0 0

The vanilla kernel spent 8 seconds asleep in direct reclaim and no time at
all asleep with the patches.

MMTests Statistics: duration
User/Sys Time Running Test (seconds) 10.51 10.73 10.6 11.66
Total Elapsed Time (seconds) 14.19 13.00 14.33 12.76

Overall, the tests completed faster. It is interesting to note that backing off further
when a zone is congested and not just a BDI was more efficient overall.

PPC64 micro-mapped-file-stream
pgalloc_dma 3024660.00 ( 0.00%) 3027185.00 ( 0.08%) 3025845.00 ( 0.04%) 3026281.00 ( 0.05%)
pgalloc_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pgsteal_dma 2508073.00 ( 0.00%) 2565351.00 ( 2.23%) 2463577.00 ( -1.81%) 2532263.00 ( 0.96%)
pgsteal_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pgscan_kswapd_dma 4601307.00 ( 0.00%) 4128076.00 ( -11.46%) 3912317.00 ( -17.61%) 3377165.00 ( -36.25%)
pgscan_kswapd_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pgscan_direct_dma 629825.00 ( 0.00%) 971622.00 ( 35.18%) 1063938.00 ( 40.80%) 1711935.00 ( 63.21%)
pgscan_direct_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
pageoutrun 27776.00 ( 0.00%) 20458.00 ( -35.77%) 18763.00 ( -48.04%) 18157.00 ( -52.98%)
allocstall 977.00 ( 0.00%) 2751.00 ( 64.49%) 2098.00 ( 53.43%) 5136.00 ( 80.98%)

Similar trends to x86-64. allocstalls are up but it's not necessarily bad.

FTrace Reclaim Statistics: vmscan
Direct reclaims 977 2709 2098 5136
Direct reclaim pages scanned 629825 963814 1063938 1711935
Direct reclaim pages reclaimed 75550 242538 150904 387647
Direct reclaim write file async I/O 0 0 0 2
Direct reclaim write anon async I/O 0 10 0 4
Direct reclaim write file sync I/O 0 0 0 0
Direct reclaim write anon sync I/O 0 0 0 0
Wake kswapd requests 392119 1201712 571935 571921
Kswapd wakeups 3 2 3 3
Kswapd pages scanned 4601307 4128076 3912317 3377165
Kswapd pages reclaimed 2432523 2318797 2312673 2144616
Kswapd reclaim write file async I/O 20 1 1 1
Kswapd reclaim write anon async I/O 57 132 11 121
Kswapd reclaim write file sync I/O 0 0 0 0
Kswapd reclaim write anon sync I/O 0 0 0 0
Time stalled direct reclaim (seconds) 6.19 7.30 13.04 10.88
Time kswapd awake (seconds) 21.73 26.51 25.55 23.90

Total pages scanned 5231132 5091890 4976255 5089100
Total pages reclaimed 2508073 2561335 2463577 2532263
%age total pages scanned/reclaimed 47.95% 50.30% 49.51% 49.76%
%age total pages scanned/written 0.00% 0.00% 0.00% 0.00%
%age file pages scanned/written 0.00% 0.00% 0.00% 0.00%
Percentage Time Spent Direct Reclaim 18.89% 20.65% 32.65% 27.65%
Percentage Time kswapd Awake 72.39% 80.68% 78.21% 77.40%

Again, a similar trend that the congestion_wait changes mean that direct
reclaim scans more pages but the overall number of pages scanned while
slightly reduced, are very similar. The ratio of scanning/reclaimed
remains roughly similar. The downside is that kswapd and direct reclaim
was awake longer and for a larger percentage of the overall workload.
It's possible there were big differences in the amount of time spent
reclaiming slab pages between the different kernels which is plausible
considering that the micro tests runs after fsmark and sysbench.

Trace Reclaim Statistics: congestion_wait
Direct number congest waited 845 1312 104 0
Direct time congest waited 19416ms 26560ms 7544ms 0ms
Direct full congest waited 745 1105 72 0
Direct number conditional waited 0 0 1322 2935
Direct time conditional waited 0ms 0ms 12ms 312ms
Direct full conditional waited 0 0 0 3
KSwapd number congest waited 39 102 75 63
KSwapd time congest waited 2484ms 6760ms 5756ms 3716ms
KSwapd full congest waited 20 48 46 25
KSwapd number conditional waited 0 0 0 0
KSwapd time conditional waited 0ms 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0 0

The vanilla kernel spent 20 seconds asleep in direct reclaim and only
312ms asleep with the patches. The time kswapd spent congest waited was
also reduced by a large factor.

MMTests Statistics: duration
ser/Sys Time Running Test (seconds) 26.58 28.05 26.9 28.47
Total Elapsed Time (seconds) 30.02 32.86 32.67 30.88

With all patches applies, the completion times are very similar.

X86-64 STRESS-HIGHALLOC
traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4
Pass 1 82.00 ( 0.00%) 84.00 ( 2.00%) 85.00 ( 3.00%) 85.00 ( 3.00%)
Pass 2 90.00 ( 0.00%) 87.00 (-3.00%) 88.00 (-2.00%) 89.00 (-1.00%)
At Rest 92.00 ( 0.00%) 90.00 (-2.00%) 90.00 (-2.00%) 91.00 (-1.00%)

Success figures across the board are broadly similar.

traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4
Direct reclaims 1045 944 886 887
Direct reclaim pages scanned 135091 119604 109382 101019
Direct reclaim pages reclaimed 88599 47535 47863 46671
Direct reclaim write file async I/O 494 283 465 280
Direct reclaim write anon async I/O 29357 13710 16656 13462
Direct reclaim write file sync I/O 154 2 2 3
Direct reclaim write anon sync I/O 14594 571 509 561
Wake kswapd requests 7491 933 872 892
Kswapd wakeups 814 778 731 780
Kswapd pages scanned 7290822 15341158 11916436 13703442
Kswapd pages reclaimed 3587336 3142496 3094392 3187151
Kswapd reclaim write file async I/O 91975 32317 28022 29628
Kswapd reclaim write anon async I/O 1992022 789307 829745 849769
Kswapd reclaim write file sync I/O 0 0 0 0
Kswapd reclaim write anon sync I/O 0 0 0 0
Time stalled direct reclaim (seconds) 4588.93 2467.16 2495.41 2547.07
Time kswapd awake (seconds) 2497.66 1020.16 1098.06 1176.82

Total pages scanned 7425913 15460762 12025818 13804461
Total pages reclaimed 3675935 3190031 3142255 3233822
%age total pages scanned/reclaimed 49.50% 20.63% 26.13% 23.43%
%age total pages scanned/written 28.66% 5.41% 7.28% 6.47%
%age file pages scanned/written 1.25% 0.21% 0.24% 0.22%
Percentage Time Spent Direct Reclaim 57.33% 42.15% 42.41% 42.99%
Percentage Time kswapd Awake 43.56% 27.87% 29.76% 31.25%

Scanned/reclaimed ratios again look good with big improvements in
efficiency. The Scanned/written ratios also look much improved. With a
better scanned/written ration, there is an expectation that IO would be
more efficient and indeed, the time spent in direct reclaim is much
reduced by the full series and kswapd spends a little less time awake.

Overall, indications here are that allocations were happening much faster
and this can be seen with a graph of the latency figures as the
allocations were taking place
http://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-hydra-mean.ps

FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 1333 204 169 4
Direct time congest waited 78896ms 8288ms 7260ms 200ms
Direct full congest waited 756 92 69 2
Direct number conditional waited 0 0 26 186
Direct time conditional waited 0ms 0ms 0ms 2504ms
Direct full conditional waited 0 0 0 25
KSwapd number congest waited 4 395 227 282
KSwapd time congest waited 384ms 25136ms 10508ms 18380ms
KSwapd full congest waited 3 232 98 176
KSwapd number conditional waited 0 0 0 0
KSwapd time conditional waited 0ms 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0 0
KSwapd full conditional waited 318 0 312 9

Overall, the time spent speeping is reduced. kswapd is still hitting
congestion_wait() but that is because there are callers remaining where it
wasn't clear in advance if they should be changed to wait_iff_congested()
or not. Overall the sleep imes are reduced though - from 79ish seconds to
about 19.

MMTests Statistics: duration
User/Sys Time Running Test (seconds) 3415.43 3386.65 3388.39 3377.5
Total Elapsed Time (seconds) 5733.48 3660.33 3689.41 3765.39

With the full series, the time to complete the tests are reduced by 30%

PPC64 STRESS-HIGHALLOC
traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4
Pass 1 17.00 ( 0.00%) 34.00 (17.00%) 38.00 (21.00%) 43.00 (26.00%)
Pass 2 25.00 ( 0.00%) 37.00 (12.00%) 42.00 (17.00%) 46.00 (21.00%)
At Rest 49.00 ( 0.00%) 43.00 (-6.00%) 45.00 (-4.00%) 51.00 ( 2.00%)

Success rates there are *way* up particularly considering that the 16MB
huge pages on PPC64 mean that it's always much harder to allocate them.

FTrace Reclaim Statistics: vmscan
stress-highalloc stress-highalloc stress-highalloc stress-highalloc
traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4
Direct reclaims 499 505 564 509
Direct reclaim pages scanned 223478 41898 51818 45605
Direct reclaim pages reclaimed 137730 21148 27161 23455
Direct reclaim write file async I/O 399 136 162 136
Direct reclaim write anon async I/O 46977 2865 4686 3998
Direct reclaim write file sync I/O 29 0 1 3
Direct reclaim write anon sync I/O 31023 159 237 239
Wake kswapd requests 420 351 360 326
Kswapd wakeups 185 294 249 277
Kswapd pages scanned 15703488 16392500 17821724 17598737
Kswapd pages reclaimed 5808466 2908858 3139386 3145435
Kswapd reclaim write file async I/O 159938 18400 18717 13473
Kswapd reclaim write anon async I/O 3467554 228957 322799 234278
Kswapd reclaim write file sync I/O 0 0 0 0
Kswapd reclaim write anon sync I/O 0 0 0 0
Time stalled direct reclaim (seconds) 9665.35 1707.81 2374.32 1871.23
Time kswapd awake (seconds) 9401.21 1367.86 1951.75 1328.88

Total pages scanned 15926966 16434398 17873542 17644342
Total pages reclaimed 5946196 2930006 3166547 3168890
%age total pages scanned/reclaimed 37.33% 17.83% 17.72% 17.96%
%age total pages scanned/written 23.27% 1.52% 1.94% 1.43%
%age file pages scanned/written 1.01% 0.11% 0.11% 0.08%
Percentage Time Spent Direct Reclaim 44.55% 35.10% 41.42% 36.91%
Percentage Time kswapd Awake 86.71% 43.58% 52.67% 41.14%

While the scanning rates are slightly up, the scanned/reclaimed and
scanned/written figures are much improved. The time spent in direct
reclaim and with kswapd are massively reduced, mostly by the lowlumpy
patches.

FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 725 303 126 3
Direct time congest waited 45524ms 9180ms 5936ms 300ms
Direct full congest waited 487 190 52 3
Direct number conditional waited 0 0 200 301
Direct time conditional waited 0ms 0ms 0ms 1904ms
Direct full conditional waited 0 0 0 19
KSwapd number congest waited 0 2 23 4
KSwapd time congest waited 0ms 200ms 420ms 404ms
KSwapd full congest waited 0 2 2 4
KSwapd number conditional waited 0 0 0 0
KSwapd time conditional waited 0ms 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0 0

Not as dramatic a story here but the time spent asleep is reduced and we
can still see what wait_iff_congested is going to sleep when necessary.

MMTests Statistics: duration
User/Sys Time Running Test (seconds) 12028.09 3157.17 3357.79 3199.16
Total Elapsed Time (seconds) 10842.07 3138.72 3705.54 3229.85

The time to complete this test goes way down. With the full series, we
are allocating over twice the number of huge pages in 30% of the time and
there is a corresponding impact on the allocation latency graph available
at.

http://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-powyah-mean.ps

This patch:

Add a trace event for shrink_inactive_list() and updates the sample
postprocessing script appropriately. It can be used to determine how many
pages were reclaimed and for non-lumpy reclaim where exactly the pages
were reclaimed from.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e11da5b4

Documentation/filesystems/proc.txt: improve smaps field documentation · 0f4d208f

由 Matt Mackall 提交于 10月 26, 2010

Signed-off-by: NMatt Mackall <mpm@selenic.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f4d208f

resources: support allocating space within a region from the top down · e7f8567d

由 Bjorn Helgaas 提交于 10月 26, 2010

Allocate space from the top of a region first, then work downward,
if an architecture desires this.

When we allocate space from a resource, we look for gaps between children
of the resource.  Previously, we always looked at gaps from the bottom up.
For example, given this:

    [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00
      [mem 0xbff00000-0xbfffffff] gap -- available
      [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02
      [mem 0xe0000000-0xf7ffffff] gap -- available

we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first,
then the [mem 0xe0000000-0xf7ffffff] gap.

With this patch an architecture can choose to allocate from the top gap
[mem 0xe0000000-0xf7ffffff] first.

We can't do this across the board because iomem_resource.end is initialized
to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't
address the entire 64-bit physical address space.  Therefore, we only
allocate top-down if the arch requests it by clearing
"resource_alloc_from_bottom".
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

e7f8567d

NFS: rename nfs.upcall -> nfs.idmap · eb1c86b8

由 Bryan Schumaker 提交于 10月 26, 2010

This patch renames the idmapper upcall program from nfs.upcall to nfs.idmap in
the NFS documentation.  This is because the program has been renamed in the
nfs-utils source.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

eb1c86b8

26 10月, 2010 3 次提交

update block_device_operations documentation · e1455d1b

由 Christoph Hellwig 提交于 10月 06, 2010

Updated Documentation/filesystems/Locking to match the code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e1455d1b

Documentation: Fix trivial typo in filesystems/sharedsubtree.txt · d9d1dc80

由 Valerie Aurora 提交于 8月 30, 2010

Documentation: Fix trivial typo in filesystems/sharedsubtree.txt

This typo is easy to ignore unless you have spent a great deal of time
thinking about how to eliminate duplicate dentries in unions.
Signed-off-by: NValerie Aurora <vaurora@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9d1dc80

hwmon: LTC4261 Hardware monitoring driver · e5f5c99a

由 Guenter Roeck 提交于 6月 25, 2010

This driver adds support for Linear Technology LTC4261 I2C Negative
Voltage Hot Swap Controller.
Reviewed-by: NIra W. Snyder <iws@ovro.caltech.edu>
Reviewed-by: NTom Grennan <tom.grennan@ericsson.com>
Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com>

e5f5c99a

25 10月, 2010 4 次提交

[S390] topology: change default · c9af3fa9

由 Heiko Carstens 提交于 10月 25, 2010

Switch default value of the kernel parameter 'topology' from off to on.
Various performance measurements have finally shown that there are no
(known) regressions anywhere.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c9af3fa9

mmc: make number of mmcblk minors configurable · 5e71b7a6

由 Olof Johansson 提交于 9月 17, 2010

The old limit of number of minor numbers per mmcblk device was hardcoded
at 8.  This isn't enough for some of the more elaborate partitioning
schemes, for example those used by Chrome OS.

Since there might be a bunch of systems out there with static /dev
contents that relies on the old numbering scheme, let's make it a
build-time option with the default set to the previous 8.

Also provide a boot/modprobe-time parameter to override the config
default: mmcblk.perdev_minors.
Signed-off-by: NOlof Johansson <olof@lixom.net>
Cc: Mandeep Baines <msb@chromium.org>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NChris Ball <cjb@laptop.org>

5e71b7a6

phylib: make local function static · 89ff05ec

由 stephen hemminger 提交于 10月 21, 2010

The following functions are not used directly by any drivers:
    phy_attach_direct
    phy_device_create
    phy_prepare_link
    genphy_config_advert
    genphy_setup_forced
    phy_config_interrupt
    phy_clear_interrypt
    phy_sanitize_settings
    phy_enable_interrupts
    phy_disable_interrupts
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89ff05ec

NFSv4.1: pnfs: full mount/umount infrastructure · 02c35fca

由 Fred Isaman 提交于 10月 20, 2010

Allow a module implementing a layout type to register, and
have its mount/umount routines called for filesystems that
the server declares support it.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: NBian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

02c35fca

24 10月, 2010 3 次提交

KVM: document 'kvm.mmu_audit' parameter · a182d873

由 Xiao Guangrong 提交于 9月 20, 2010

Document this parameter into Documentation/kernel-parameters.txt
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a182d873

KVM: fix the description of kvm-amd.nested in documentation · 8475f94a

由 Xiao Guangrong 提交于 9月 20, 2010

The default state of 'kvm-amd.nested' is enabled now, so fix the documentation
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8475f94a

A
KVM: Document that KVM_GET_SUPPORTED_CPUID may return emulated values · c39cbd2a
由 Avi Kivity 提交于 9月 12, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
c39cbd2a

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功