提交 · 99a15e21d96f6857dafab1e5167e5e8183215c9c · openeuler / Kernel

17 6月, 2011 7 次提交

migrate: don't account swapcache as shmem · 99a15e21

由 Andrea Arcangeli 提交于 6月 16, 2011

swapcache will reach the below code path in migrate_page_move_mapping,
and swapcache is accounted as NR_FILE_PAGES but it's not accounted as
NR_SHMEM.

Hugh pointed out we must use PageSwapCache instead of comparing
mapping to &swapper_space, to avoid build failure with CONFIG_SWAP=n.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99a15e21

Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 · 7cc2ed05

由 Linus Torvalds 提交于 6月 16, 2011

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  kbuild: Call depmod.sh via shell
  perf: clear out make flags when calling kernel make kernelver

7cc2ed05

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 8dac6bee

由 Linus Torvalds 提交于 6月 16, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  AFS: Use i_generation not i_version for the vnode uniquifier
  AFS: Set s_id in the superblock to the volume name
  vfs: Fix data corruption after failed write in __block_write_begin()
  afs: afs_fill_page reads too much, or wrong data
  VFS: Fix vfsmount overput on simultaneous automount
  fix wrong iput on d_inode introduced by e6bc45d6
  Delay struct net freeing while there's a sysfs instance refering to it
  afs: fix sget() races, close leak on umount
  ubifs: fix sget races
  ubifs: split allocation of ubifs_info into a separate function
  fix leak in proc_set_super()

8dac6bee

Merge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x · f8f44f09

由 Linus Torvalds 提交于 6月 16, 2011

* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x:
  sh: sh7724: Add USBHS DMAEngine support
  sh: ecovec: Add renesas_usbhs support
  sh, exec: remove redundant set_fs(USER_DS)
  drivers: sh: resume enabled clocks fix
  dmaengine: shdma: SH_DMAC_MAX_CHANNELS message fix
  sh: Fix up xchg/cmpxchg corruption with gUSA RB.
  sh: Remove compressed kernel libgcc dependency.
  sh: fix wrong icache/dcache address-array start addr in cache-debugfs.

f8f44f09

Merge branch 'rmobile-fixes-for-linus' of... · f49cc57c

由 Linus Torvalds 提交于 6月 16, 2011

Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x

* 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x:
  ARM: mach-shmobile: mackerel: tidyup usbhs driver settings
  ARM: mach-shmobile: Correct SCIF port types for SH7367.
  ARM: mach-shmobile: sh73a0 gic_arch_extn.irq_set_wake() fix
  ARM: mach-shmobile: Mackerel USB platform data update
  ARM: mach-shmobile: AG5EVM SDHI1 platform data update

f49cc57c

Merge branch 'fbdev-fixes-for-linus' of... · f4ef0842

由 Linus Torvalds 提交于 6月 16, 2011

Merge branch 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-3.x

* 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-3.x:
  fbdev: sh_mobile_hdmi: fix regression: statically enable RTPM
  fbdev/atyfb: Fix 2 defined-but-not-used warnings
  efifb: Fix call to wrong unregister function
  video: s3c-fb: move enabling channel for window
  video: s3c-fb: fix virtual resolution checking
  video: s3c-fb: fix misleading kfree in remove function

f4ef0842

Merge branch 'for-linus' of... · df9d030c

由 Linus Torvalds 提交于 6月 16, 2011

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  SELinux: skip file_name_trans_write() when policy downgraded.
  selinux: fix case of names with whitespace/multibytes on /selinux/create

df9d030c

16 6月, 2011 33 次提交

AFS: Use i_generation not i_version for the vnode uniquifier · d6e43f75

由 David Howells 提交于 6月 14, 2011

Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct.  i_version can then be given the AFS data version
number.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d6e43f75

AFS: Set s_id in the superblock to the volume name · 2e41ae22

由 David Howells 提交于 6月 14, 2011

Set s_id in the superblock to the name of the AFS volume that this superblock
corresponds to.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2e41ae22

vfs: Fix data corruption after failed write in __block_write_begin() · f9f07b6c

由 Jan Kara 提交于 6月 14, 2011

I've got a report of a file corruption from fsxlinux on ext3. The important
operations to the page were:
mapwrite to a hole
partial write to the page
read - found the page zeroed from the end of the normal write

The culprit seems to be that if get_block() fails in __block_write_begin()
(e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
of the page is needed and overwrites old data.  In fact, I don't see why we
should ever need to zero the uptodate bit here - either the page was uptodate
when we entered __block_write_begin() and it should stay so when we leave it,
or it was not uptodate and noone had right to set it uptodate during
__block_write_begin() so it remains !uptodate when we leave as well. So just
remove clearing of the bit.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f9f07b6c

afs: afs_fill_page reads too much, or wrong data · 5e7f2337

由 Anton Blanchard 提交于 6月 13, 2011

afs_fill_page should read the page that is about to be written but
the current implementation has a number of issues. If we aren't
extending the file we always read PAGE_CACHE_SIZE at offset 0. If we
are extending the file we try to read the entire file.

Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset,
clamped to i_size.

While here, avoid calling afs_fill_page when we are doing a
PAGE_CACHE_SIZE write.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5e7f2337

staging: fix iio builds when IIO_RING_BUFFER is not enabled · e1d76719

由 Randy Dunlap 提交于 6月 14, 2011

Fix build by moving enum list outside of
#ifdef CONFIG_IIO_RING_BUFFER.

  drivers/staging/iio/accel/adis16201_core.c:413: error: 'ADIS16201_SCAN_SUPPLY' undeclared here (not in a function)
  drivers/staging/iio/accel/adis16201_core.c:417: error: 'ADIS16201_SCAN_TEMP' undeclared here (not in a function)
  ..

  drivers/staging/iio/accel/adis16203_core.c:374: error: 'ADIS16203_SCAN_SUPPLY' undeclared here (not in a function)
  drivers/staging/iio/accel/adis16203_core.c:378: error: 'ADIS16203_SCAN_AUX_ADC' undeclared here (not in a function)
  ..
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NJonathan Cameron <jic23@cam.ac.uk>
Cc: linux-iio@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e1d76719

VFS: Fix vfsmount overput on simultaneous automount · 8aef1884

由 Al Viro 提交于 6月 16, 2011

[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.
Tested-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NIan Kent <raven@themaw.net>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8aef1884

fix wrong iput on d_inode introduced by · 50338b88

由 Török Edwin 提交于 6月 16, 2011

Git bisection shows that commit e6bc45d6 causes
BUG_ONs under high I/O load:

kernel BUG at fs/inode.c:1368!
[ 2862.501007] Call Trace:
[ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
[ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
[ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
[ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
[ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
[ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b

A reliable way to reproduce this bug is:
Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
and apt-get remove openjdk-6-jdk.

The buggy part of the patch is this:
	struct inode *inode = NULL;
.....
-               if (nd.last.name[nd.last.len])
-                       goto slashes;
                inode = dentry->d_inode;
-               if (inode)
-                       ihold(inode);
+               if (nd.last.name[nd.last.len] || !inode)
+                       goto slashes;
+               ihold(inode)
...
	if (inode)
		iput(inode);	/* truncate the inode here */

If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
and dentry->d_inode is non-NULL, then this code now does an additional iput on
the inode, which is wrong.

Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.

Reference: https://lkml.org/lkml/2011/6/15/50Reported-by: NNorbert Preining <preining@logic.at>
Reported-by: NTörök Edwin <edwintorok@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

50338b88

mm: get rid of the most spurious find_vma_prev() users · 9be34c9d

由 Linus Torvalds 提交于 6月 16, 2011

We have some users of this function that date back to before the vma
list was doubly linked, and just are silly.  These days, you can find
the previous vma by just following the vma->vm_prev pointer.

In some cases you don't need any find_vma() lookup at all, and in other
cases you're better off with the regular "find_vma()" that uses the vma
cache front-end lookup.

Some "find_vma_prev()" users are still valid, though.  For example, in
the case of a stack that grows up, it can be the case that we don't find
any 'vma' at all (because we're looking up an address that is past the
last vma), and that the stack that we want to grow is the 'prev' vma.

But that kind of special case aside, we generally should prefer to use
'find_vma()'.

Noticed due to a totally unrelated POWER memory corruption bug that just
happened to hit in 'find_vma_prev()' and made me go "Hmm - why are we
using that function here?".
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9be34c9d

sh: sh7724: Add USBHS DMAEngine support · 261a9af6

由 Kuninori Morimoto 提交于 6月 15, 2011

Signed-off-by: NKuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

261a9af6

sh: ecovec: Add renesas_usbhs support · fb2e7394

由 Kuninori Morimoto 提交于 6月 15, 2011

Signed-off-by: NKuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

fb2e7394

Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm · 19a1166f

由 Linus Torvalds 提交于 6月 15, 2011

* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: footbridge: fix clock event support
  ARM: footbridge: fix debug macros
  ARM: initrd: disable initrds outside of memory
  ARM: extend Code: line by one 16-bit quantity for Thumb instructions
  ARM: 6955/1: cmpxchg syscall should data abort if page not write
  ARM: 6954/1: zImage: fix Thumb2 breakage
  ARM: 6953/1: DT: don't try to access physical address zero
  ARM: 6949/2: mach-u300: fix compilaton warning in IO accessors
  Revert "ARM: 6944/1: mm: allow ASID 0 to be allocated to tasks"
  Revert "ARM: 6943/1: mm: use TTBR1 instead of reserved context ID"
  davinci: make PCM platform devices static
  arm: davinci: Fix fallout from generic irq chip conversion
  ARM: 6894/1: mmci: trigger card detect IRQs on falling and rising edges
  ARM: 6952/1: fix lockdep warning of "unannotated irqs-off"
  ARM: 6951/1: include .bss in memory layout information
  ARM: 6948/1: Fix .size directives for __arm{7,9}tdmi_proc_info
  ARM: 6947/2: mach-u300: fix compilation error in timer
  ARM: 6946/1: vexpress: move v2m clock init to init_early
  ARM: mx51/sdma: Check the chip revision in run-time
  arm: mxs: include asm/processor.h for cpu_relax()

19a1166f

Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP" · 13fca640

由 Linus Torvalds 提交于 6月 15, 2011

This reverts commit 7f81c889.

It turns out that it's not actually a build-time check on x86-64 UML,
which does some seriously crazy stuff with VM_STACK_FLAGS.

The VM_STACK_FLAGS define depends on the arch-supplied
VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have

  arch/um/sys-x86_64/shared/sysdep/vm-flags.h:

	#define VM_STACK_DEFAULT_FLAGS \
		(test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags)

	#define VM_STACK_DEFAULT_FLAGS vm_stack_flags

(yes, seriously: two different #define's for that thing, with the first
one being inside an "#ifdef TIF_IA32")

It's possible that it is UML that should just be fixed in this area, but
for now let's just undo the (very small) optimization.
Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13fca640

Documentation: fix cgroup typos and formatting · 67de0162

由 Jörg Sommer 提交于 6月 15, 2011

Fix format and spelling.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67de0162

Documentation: update cgroupfs mount point · f6e07d38

由 Jörg Sommer 提交于 6月 15, 2011

According to commit 676db4af ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup.  Hence, this should be used in the documentation.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6e07d38

Documentation: update kmemleak supported archs · 06a2c45d

由 Maxin B. John 提交于 6月 15, 2011

Instead of listing the architectures that are supported by
kmemleak in Documentation/kmemleak.txt, just refer people to
the list of supported architecutures in lib/Kconfig.debug so
that Documentation/kmemleak.txt does not need more updates
for this.
Signed-off-by: NMaxin B. John <maxin.john@gmail.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

06a2c45d

Documentation: update printk-formats.txt · 04c55715

由 Andrew Murray 提交于 6月 15, 2011

This patch updates the incomplete documentation concerning the printk
extended format specifiers.
Signed-off-by: NAndrew Murray <amurray@mpc-data.co.uk>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04c55715

Merge branch 'sched-urgent-for-linus' of... · a1b6ae8e

由 Linus Torvalds 提交于 6月 15, 2011

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Check if lowest_mask is initialized in find_lowest_rq()
  sched: Fix need_resched() when checking peempt

a1b6ae8e

alpha: fix several security issues · 21c5977a

由 Dan Rosenberg 提交于 6月 15, 2011

Fix several security issues in Alpha-specific syscalls.  Untested, but
mostly trivial.

1. Signedness issue in osf_getdomainname allows copying out-of-bounds
kernel memory to userland.

2. Signedness issue in osf_sysinfo allows copying large amounts of
kernel memory to userland.

3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy
size, allowing copying large amounts of kernel memory to userland.

4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows
privilege escalation via writing return value of sys_wait4 to kernel
memory.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

21c5977a

drivers/misc/apds990x.c: apds990x_chip_on() should depend on CONFIG_PM || CONFIG_PM_RUNTIME · ec8f9cea

由 Geert Uytterhoeven 提交于 6月 15, 2011

Fixes this warning:

  drivers/misc/apds990x.c: At top level:
  drivers/misc/apds990x.c:613: warning: `apds990x_chip_on' defined but not used
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ec8f9cea

ksm: fix NULL pointer dereference in scan_get_next_rmap_item() · 2b472611

由 Hugh Dickins 提交于 6月 15, 2011

Andrea Righi reported a case where an exiting task can race against
ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742) easily
triggering a NULL pointer dereference in ksmd.

ksm_scan.mm_slot == &ksm_mm_head with only one registered mm

CPU 1 (__ksm_exit)		CPU 2 (scan_get_next_rmap_item)
 				list_empty() is false
lock				slot == &ksm_mm_head
list_del(slot->mm_list)
(list now empty)
unlock
				lock
				slot = list_entry(slot->mm_list.next)
				(list is empty, so slot is still ksm_mm_head)
				unlock
				slot->mm == NULL ... Oops

Close this race by revalidating that the new slot is not simply the list
head again.

Andrea's test case:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define BUFSIZE getpagesize()

int main(int argc, char **argv)
{
	void *ptr;

	if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
		perror("posix_memalign");
		exit(1);
	}
	if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
		perror("madvise");
		exit(1);
	}
	*(char *)NULL = 0;

	return 0;
}
Reported-by: NAndrea Righi <andrea@betterlinux.com>
Tested-by: NAndrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b472611

rtc: fix build warnings in defconfigs · c7cbb022

由 Wanlong Gao 提交于 6月 15, 2011

RTC_CLASS is changed to bool, so 'm' is invalid.
Signed-off-by: NWanlong Gao <wanlong.gao@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Acked-by: NWolfram Sang <w.sang@pengutronix.de>
Acked-by: NHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7cbb022

drivers/tty/serial/pch_uart.c: don't oops if dmi_get_system_info returns NULL · fb139dfe

由 Alexander Stein 提交于 6月 15, 2011

If dmi_get_system_info() returns NULL, pch_uart_init_port() will
dereferencea a zero pointer.

This oops was observed on an Atom based board which has no BIOS, but
a bootloder which doesn't provide DMI data.
Signed-off-by: NAlexander Stein <alexander.stein@systec-electronic.com>
Cc: Greg KH <gregkh@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb139dfe

drivers/char/hpet.c: fix periodic-emulation for delayed interrupts · 273ef950

由 Nils Carlson 提交于 6月 15, 2011

When interrupts are delayed due to interrupt masking or due to other
interrupts being serviced the HPET periodic-emuation would fail.  This
happened because given an interval t and a time for the current interrupt
m we would compute the next time as t + m.  This works until we are
delayed for > t, in which case we would be writing a new value which is in
fact in the past.

This can be solved by computing the next time instead as (k * t) + m where
k is large enough to be in the future.  The exact computation of k is
described in a comment to the code.

More detail:

Assuming an interval of 5 between each expected interrupt we have a normal
case of

t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t5: interrupt, read t5 from comparator, set next interrupt t5 + 5
t10: interrupt, read t10 from comparator, set next interrupt t10 + 5
...

So, what happens when the interrupt is serviced too late?

t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t11: delayed interrupt serviced, read t5 from comparator, set next
interrupt t5 + 5, which is in the past!
... counter loops ...
t10: Much much later, get the next interrupt.

This can happen either because we have interrupts masked for too long
(some stupid driver goes on a printk rampage) or just because we are
pushing the limits of the interval (too small a period), or both most
probably.

My solution is to read the main counter as well and set the next interrupt
to occur at the right interval, for example:

t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t11: delayed interrupt serviced, read t5 from comparator, set next
interrupt t15 as t10 has been missed.
t15: back on track.
Signed-off-by: NNils Carlson <nils.carlson@ericsson.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

273ef950

Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt · 31b5f8ee

由 akpm@linux-foundation.org 提交于 6月 15, 2011

Commit a77aea92 ("cgroup: remove the ns_cgroup") removed the
ns_cgroup but it forgot to remove the related doc in
feature-removal-schedule.txt.
Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Serge E.  Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31b5f8ee

mm: compaction: abort compaction if too many pages are isolated and caller is asynchronous V2 · f9e35b3b

由 Mel Gorman 提交于 6月 15, 2011

Asynchronous compaction is used when promoting to huge pages.  This is all
very nice but if there are a number of processes in compacting memory, a
large number of pages can be isolated.  An "asynchronous" process can
stall for long periods of time as a result with a user reporting that
firefox can stall for 10s of seconds.  This patch aborts asynchronous
compaction if too many pages are isolated as it's better to fail a
hugepage promotion than stall a process.

[minchan.kim@gmail.com: return COMPACT_PARTIAL for abort]
Reported-and-tested-by: NUry Stankevich <urykhy@gmail.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9e35b3b

mm: vmscan: do not use page_count without a page pin · d179e84b

由 Andrea Arcangeli 提交于 6月 15, 2011

It is unsafe to run page_count during the physical pfn scan because
compound_head could trip on a dangling pointer when reading
page->first_page if the compound page is being freed by another CPU.

[mgorman@suse.de: split out patch]
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d179e84b

mm: compaction: ensure that the compaction free scanner does not move to the next zone · 7454f4ba

由 Mel Gorman 提交于 6月 15, 2011

Compaction works with two scanners, a migration and a free scanner.  When
the scanners crossover, migration within the zone is complete.  The
location of the scanner is recorded on each cycle to avoid excesive
scanning.

When a zone is small and mostly reserved, it's very easy for the migration
scanner to be close to the end of the zone.  Then the following situation
can occurs

  o migration scanner isolates some pages near the end of the zone
  o free scanner starts at the end of the zone but finds that the
    migration scanner is already there
  o free scanner gets reinitialised for the next cycle as
    cc->migrate_pfn + pageblock_nr_pages
    moving the free scanner into the next zone
  o migration scanner moves into the next zone

When this happens, NR_ISOLATED accounting goes haywire because some of the
accounting happens against the wrong zone.  One zones counter remains
positive while the other goes negative even though the overall global
count is accurate.  This was reported on X86-32 with !SMP because !SMP
allows the negative counters to be visible.  The fact that it is the bug
should theoritically be possible there.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7454f4ba

compaction: checks correct fragmentation index · a582a738

由 Shaohua Li 提交于 6月 15, 2011

fragmentation_index() returns -1000 when the allocation might succeed
This doesn't match the comment and code in compaction_suitable(). I
thought compaction_suitable should return COMPACT_PARTIAL in -1000
case, because in this case allocation could succeed depending on
watermarks.

The impact of this is that compaction starts and compact_finished() is
called which rechecks the watermarks and the free lists.  It should have
the same result in that compaction should not start but is more expensive.
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a582a738

mm/memory-failure.c: fix page isolated count mismatch · 5db8a73a

由 Minchan Kim 提交于 6月 15, 2011

Pages isolated for migration are accounted with the vmstat counters
NR_ISOLATE_[ANON|FILE].  Callers of migrate_pages() are expected to
increment these counters when pages are isolated from the LRU.  Once the
pages have been migrated, they are put back on the LRU or freed and the
isolated count is decremented.

Memory failure is not properly accounting for pages it isolates causing
the NR_ISOLATED counters to be negative.  On SMP builds, this goes
unnoticed as negative counters are treated as 0 due to expected per-cpu
drift.  On UP builds, the counter is treated by too_many_isolated() as a
large value causing processes to enter D state during page reclaim or
compaction.  This patch accounts for pages isolated by memory failure
correctly.

[mel@csn.ul.ie: rewrote changelog]
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5db8a73a

gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL · d2c32258

由 Josh Triplett 提交于 6月 15, 2011

CONFIG_CONSTRUCTORS controls support for running constructor functions at
kernel init time. According to commit b99b87f7 ("kernel:
constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
CONFIG_GCOV_KERNEL select it, so that the normal case of
CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

Observed in the short list of =y values in a minimal kernel configuration.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Acked-by: NPeter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2c32258

MAINTAINERS: add entry for legacy eeprom driver · b0461a44

由 Jean Delvare 提交于 6月 15, 2011

I shall maintain the legacy eeprom driver, until we finally get rid of it.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b0461a44

memcg: avoid percpu cached charge draining at softlimit · fbc29a25

由 KAMEZAWA Hiroyuki 提交于 6月 15, 2011

Based on Michal Hocko's comment.

We are not draining per cpu cached charges during soft limit reclaim
because background reclaim doesn't care about charges.  It tries to free
some memory and charges will not give any.

Cached charges might influence only selection of the biggest soft limit
offender but as the call is done only after the selection has been already
done it makes no change.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fbc29a25

memcg: fix percpu cached charge draining frequency · 26fe6168

由 KAMEZAWA Hiroyuki 提交于 6月 15, 2011

For performance, memory cgroup caches some "charge" from res_counter into
per cpu cache.  This works well but because it's cache, it needs to be
flushed in some cases.  Typical cases are

   1. when someone hit limit.

   2. when rmdir() is called and need to charges to be 0.

But "1" has problem.

Recently, with large SMP machines, we see many kworker runs because of
flushing memcg's cache.  Bad things in implementation are that even if a
cpu contains a cache for memcg not related to a memcg which hits limit,
drain code is called.

This patch does
        A) check percpu cache contains a useful data or not.
        B) check other asynchronous percpu draining doesn't run.
        C) don't call local cpu callback.

(*)This patch avoid changing the calling condition with hard-limit.

When I run "cat 1Gfile > /dev/null" under 300M limit memcg,

[Before]
13767 kamezawa  20   0 98.6m  424  416 D 10.0  0.0   0:00.61 cat
   58 root      20   0     0    0    0 S  0.6  0.0   0:00.09 kworker/2:1
   60 root      20   0     0    0    0 S  0.6  0.0   0:00.08 kworker/4:1
    4 root      20   0     0    0    0 S  0.3  0.0   0:00.02 kworker/0:0
   57 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/1:1
   61 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/5:1
   62 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/6:1
   63 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/7:1

[After]
 2676 root      20   0 98.6m  416  416 D  9.3  0.0   0:00.87 cat
 2626 kamezawa  20   0 15192 1312  920 R  0.3  0.0   0:00.28 top
    1 root      20   0 19384 1496 1204 S  0.0  0.0   0:00.66 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0

[akpm@linux-foundation.org: make percpu_charge_mutex static, tweak comments]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Tested-by: NYing Han <yinghan@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26fe6168

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功