提交 · f1ac9f6bfea6f21e8ab6dbbe46879d62a6fba8c0 · openanolis / cloud-kernel

09 9月, 2009 3 次提交

Simplify exec_permission_lite() further · f1ac9f6b

由 Linus Torvalds 提交于 8月 28, 2009

This function is only called for path components that are already known
to be directories (they have a '->lookup' method).  So don't bother
doing that whole S_ISDIR() testing, the whole point of the 'lite()'
version is that we know that we are looking at a directory component,
and that we're only checking name lookup permission.
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1ac9f6b

Simplify exec_permission_lite() logic · b7a437b0

由 Linus Torvalds 提交于 8月 28, 2009

Instead of returning EAGAIN and having the caller do something
special for that case,  just do the special case directly.
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7a437b0

Do not call 'ima_path_check()' for each path component · e8e66ed2

由 Linus Torvalds 提交于 8月 28, 2009

Not only is that a supremely timing-critical path, but it's hopefully
some day going to be lockless for the common case, and ima can't do
that.

Plus the integrity code doesn't even care about non-regular files, so it
was always a total waste of time and effort.
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NMimi Zohar <zohar@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8e66ed2

07 9月, 2009 1 次提交

IMA: update ima_counts_put · acd0c935

由 Mimi Zohar 提交于 9月 04, 2009

- As ima_counts_put() may be called after the inode has been freed,
verify that the inode is not NULL, before dereferencing it.

- Maintain the IMA file counters in may_open() properly, decrementing
any counter increments on subsequent errors.
Reported-by: NCiprian Docan <docan@eden.rutgers.edu>
Reported-by: NJ.R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com
Signed-off-by: NJames Morris <jmorris@namei.org>

acd0c935

06 9月, 2009 2 次提交

ext2: fix unbalanced kmap()/kunmap() · 9de6886e

由 Nicolas Pitre 提交于 9月 05, 2009

In ext2_rename(), dir_page is acquired through ext2_dotdot().  It is
then released through ext2_set_link() but only if old_dir != new_dir.
Failing that, the pkmap reference count is never decremented and the
page remains pinned forever.  Repeat that a couple times with highmem
pages and all pkmap slots get exhausted, and every further kmap() calls
end up stalling on the pkmap_map_wait queue at which point the whole
system comes to a halt.
Signed-off-by: NNicolas Pitre <nico@marvell.com>
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9de6886e

exec: do not sleep in TASK_TRACED under ->cred_guard_mutex · a2a8474c

由 Oleg Nesterov 提交于 9月 05, 2009

Tom Horsley reports that his debugger hangs when it tries to read
/proc/pid_of_tracee/maps, this happens since

	"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"
	04b836cbf19e885f8366bccb2e4b0474346c02d

commit in 2.6.31.

But the root of the problem lies in the fact that do_execve() path calls
tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.

The tracee must not sleep in TASK_TRACED holding this mutex.  Even if we
remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
another task doing PTRACE_ATTACH should not hang until it is killed or the
tracee resumes.

With this patch do_execve() does not use ->cred_guard_mutex directly and
we do not hold it throughout, instead:

	- introduce prepare_bprm_creds() helper, it locks the mutex
	  and calls prepare_exec_creds() to initialize bprm->cred.

	- install_exec_creds() drops the mutex after commit_creds(),
	  and thus before tracehook_report_exec()->ptrace_stop().

	  or, if exec fails,

	  free_bprm() drops this mutex when bprm->cred != NULL which
	  indicates install_exec_creds() was not called.
Reported-by: NTom Horsley <tom.horsley@att.net>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2a8474c

05 9月, 2009 1 次提交

ocfs2: ocfs2_write_begin_nolock() should handle len=0 · 8379e7c4

由 Sunil Mushran 提交于 9月 04, 2009

Bug introduced by mainline commit e7432675
The bug causes ocfs2_write_begin_nolock() to oops when len=0.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8379e7c4

03 9月, 2009 1 次提交

JFFS2: add missing verify buffer allocation/deallocation · bc8cec0d

由 Massimo Cirillo 提交于 8月 27, 2009

The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer
if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when
that macro is enabled and the verify function is called. Similarly the
jffs2_nor_wbuf_flash_cleanup() must free the buffer if
CONFIG_JFFS2_FS_WBUF_VERIFY is enabled.
The following patch fixes the problem.
The following patch applies to 2.6.30 kernel.
Signed-off-by: NMassimo Cirillo <maxcir@gmail.com>
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org

bc8cec0d

02 9月, 2009 1 次提交

xfs: actually enable the swapext compat handler · 3725867d

由 Christoph Hellwig 提交于 9月 01, 2009

Fix a small typo in the compat ioctl handler that cause the swapext
compat handler to never be called.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Tested-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

3725867d

01 9月, 2009 1 次提交

autofs4 - fix missed case when changing to use struct path · 37d0892c

由 Ian Kent 提交于 9月 01, 2009

In the recent change by Al Viro that changes verious subsystems
to use "struct path" one case was missed in the autofs4 module
which causes mounts to no longer expire.
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37d0892c

31 8月, 2009 1 次提交

nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key · b1f1b8ce

由 Ryusuke Konishi 提交于 8月 30, 2009

This will fix the following preempt count underflow reported from
users with the title "[NILFS users] segctord problem" (Message-ID:
<949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID:
<debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>):

 WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0()
 Hardware name: HP Compaq 6530b (KR980UT#ABC)
 Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod
 Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7
 Call Trace:
  [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0
  [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0
  [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20
  [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0
  [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2]
  [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2]
  [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2]
  [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2]
  [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2]
  [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40
  [<ffffffff803c06e0>] ? __up_write+0xe0/0x150
  [<ffffffff80262959>] ? up_write+0x9/0x10
  [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2]
  [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2]
  [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2]
  [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2]
  [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2]
  [<ffffffff80252633>] ? add_timer+0x13/0x20
  [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90
  [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffff8025e556>] kthread+0x56/0x90
  [<ffffffff8020cdea>] child_rip+0xa/0x20
  [<ffffffff8025e500>] ? kthread+0x0/0x90
  [<ffffffff8020cde0>] ? child_rip+0x0/0x20

This problem was caused due to a missing radix_tree_preload() call in
the retry path of nilfs_btnode_prepare_change_key() function.
Reported-by: NEric A <eric225125@yahoo.com>
Reported-by: NJerome Poulin <jeromepoulin@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NJerome Poulin <jeromepoulin@gmail.com>
Cc: stable@kernel.org

b1f1b8ce

29 8月, 2009 1 次提交

inotify: update the group mask on mark addition · 750a8870

由 Eric Paris 提交于 8月 28, 2009

Seperating the addition and update of marks in inotify resulted in a
regression in that inotify never gets events. The inotify group mask is
always 0. This mask should be updated any time a new mark is added.
Signed-off-by: NEric Paris <eparis@redhat.com>

750a8870

28 8月, 2009 4 次提交

inotify: fix length reporting and size checking · 83cb10f0

由 Eric Paris 提交于 8月 28, 2009

0db501bd introduced a regresion in that it now sends a nul
terminator but the length accounting when checking for space or
reporting to userspace did not take this into account.  This corrects
all of the rounding logic.
Signed-off-by: NEric Paris <eparis@redhat.com>

83cb10f0

inotify: do not send a block of zeros when no pathname is available · b962e731

由 Brian Rogers 提交于 8月 28, 2009

When an event has no pathname, there's no need to pad it with a null byte and
therefore generate an inotify_event sized block of zeros. This fixes a
regression introduced by commit 0db501bd where
my system wouldn't finish booting because some process was being confused by
this.
Signed-off-by: NBrian Rogers <brian@xyzw.org>
Signed-off-by: NEric Paris <eparis@redhat.com>

b962e731

ocfs2: invalidate dentry if its dentry_lock isn't initialized. · a1b08e75

由 Tao Ma 提交于 8月 27, 2009

In commit a5a0a630, when
ocfs2_attch_dentry_lock fails, we call an extra iput and reset
dentry->d_fsdata to NULL. This resolve a bug, but it isn't
completed and the dentry is still there. When we want to use
it again, ocfs2_dentry_revalidate doesn't catch it and return
true. That make future ocfs2_dentry_lock panic out.
One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162.

The resolution is to add a check for dentry->d_fsdata in
revalidate process and return false if dentry->d_fsdata is NULL,
so that a new ocfs2_lookup will be called again.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a1b08e75

AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr · 9886e836

由 David Howells 提交于 8月 27, 2009

kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.

Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9886e836

27 8月, 2009 4 次提交

inotify: Ensure we alwasy write the terminating NULL. · 0db501bd

由 Eric W. Biederman 提交于 8月 27, 2009

Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename.  Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size.  Ouch!

So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event.  I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

0db501bd

inotify: fix locking around inotify watching in the idr · dead537d

由 Eric Paris 提交于 8月 24, 2009

The are races around the idr storage of inotify watches. It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal. Move the
locking and the refcnt'ing so that these have to happen atomically.
Signed-off-by: NEric Paris <eparis@redhat.com>

dead537d

inotify: do not BUG on idr entries at inotify destruction · cf437426

由 Eric Paris 提交于 8月 24, 2009

If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG. This is not a dangerous situation and really
indicates a programming bug and leak of memory. This patch changes it to
use a WARN and a printk rather than killing people's boxes.
Signed-off-by: NEric Paris <eparis@redhat.com>

cf437426

inotify: seperate new watch creation updating existing watches · 52cef755

由 Eric Paris 提交于 8月 24, 2009

There is nothing known wrong with the inotify watch addition/modification
but this patch seperates the two code paths to make them each easy to
verify as correct.
Signed-off-by: NEric Paris <eparis@redhat.com>

52cef755

25 8月, 2009 2 次提交

NFSv4: Fix an infinite looping problem with the nfs4_state_manager · 7111dc73

由 Trond Myklebust 提交于 8月 24, 2009

Commit 76db6d95 (nfs41: add session setup
to the state manager) introduces an infinite loop possibility in the NFSv4
state manager. By first checking nfs4_has_session() before clearing the
NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets
that flag, but it never gets cleared, and so the state manager loops.

In fact commit c3fad1b1 (nfs41: add session
reset to state manager) causes this to happen every time we get a network
partition error.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7111dc73

mm: fix hugetlb bug due to user_shm_unlock call · 353d5c30

由 Hugh Dickins 提交于 8月 24, 2009

2.6.30's commit 8a0bdec1 removed
user_shm_lock() calls in hugetlb_file_setup() but left the
user_shm_unlock call in shm_destroy().

In detail:
Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
is not called in hugetlb_file_setup(). However, user_shm_unlock() is
called in any case in shm_destroy() and in the following
atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
up->__count gets zero, also cleanup_user_struct() is scheduled.

Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
However, the ref counter up->__count gets unexpectedly non-positive and
the corresponding structs are freed even though there are live
references to them, resulting in a kernel oops after a lots of
shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.

Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
time of shm_destroy() may give a different answer from at the time
of hugetlb_file_setup().  And fixed newseg()'s no_id error path,
which has missed user_shm_unlock() ever since it came in 2.6.9.
Reported-by: NStefan Huber <shuber2@gmail.com>
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: NStefan Huber <shuber2@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

353d5c30

24 8月, 2009 3 次提交

ext3: Improve error message that changing journaling mode on remount is not possible · 3c4cec65

由 Jan Kara 提交于 8月 24, 2009

This patch makes the error message about changing journaling mode on remount
more descriptive. Some people are going to hit this error now due to commit
bbae8bcc if they configure a kernel to default
to data=writeback mode. The problem happens if they have data=ordered set for
the root filesystem in /etc/fstab but not in the kernel command line (and they
don't use initrd). Their filesystem then gets mounted as data=writeback by
kernel but then their boot fails because init scripts won't be able to remount
the filesystem rw. Better error message will hopefully make it easier for them
to find the error in their setup and bother us less with error reports :).
Signed-off-by: NJan Kara <jack@suse.cz>

3c4cec65

ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED · 6d418076

由 Theodore Ts'o 提交于 8月 10, 2009

The old description for this configuration option was perhaps not
completely balanced in terms of describing the tradeoffs of using a
default of data=writeback vs. data=ordered.  Despite the fact that old
description very strongly recomended disabling this feature, all of
the major distributions have elected to preserve the existing 'legacy'
default, which is a strong hint that it perhaps wasn't telling the
whole story.

This revised description has been vetted by a number of ext3
developers as being better at informing the user about the tradeoffs
of enabling or disabling this configuration feature.

Cc: linux-ext4@vger.kernel.org
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>

6d418076

kernel_read: redefine offset type · 6777d773

由 Mimi Zohar 提交于 8月 21, 2009

vfs_read() offset is defined as loff_t, but kernel_read()
offset is only defined as unsigned long. Redefine
kernel_read() offset as loff_t.

Cc: stable@kernel.org
Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6777d773

22 8月, 2009 1 次提交

Re-introduce page mapping check in mark_buffer_dirty() · 8e9d78ed

由 Linus Torvalds 提交于 8月 21, 2009

In commit a8e7d49a ("Fix race in
create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
for a NULL page mapping unintentionally when some of the code inside
__set_page_dirty() was moved to the callers.

That removal generally didn't matter, since a filesystem would serialize
truncation (which clears the page mapping) against writing (which marks
the buffer dirty), so locking at a higher level (either per-page or an
inode at a time) should mean that the buffer page would be stable.  And
indeed, nothing bad seemed to happen.

Except it turns out that apparently reiserfs does something odd when
under load and writing out the journal, and we have a number of bugzilla
entries that look similar:

	http://bugzilla.kernel.org/show_bug.cgi?id=13556
	http://bugzilla.kernel.org/show_bug.cgi?id=13756
	http://bugzilla.kernel.org/show_bug.cgi?id=13876

and it looks like reiserfs depended on that check (the common theme
seems to be "data=journal", and a journal writeback during a truncate).

I suspect reiserfs should have some additional locking, but in the
meantime this should get us back to the pre-2.6.29 behavior.
Pattern-pointed-out-by: NRoland Kletzing <devzero@web.de>
Cc: stable@kernel.org (2.6.29 and 2.6.30)
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e9d78ed

21 8月, 2009 3 次提交

btrfs: fix inode rbtree corruption · 03e860bd

由 From: Nick Piggin 提交于 8月 21, 2009

Node may not be inserted over existing node. This causes inode tree
corruption and I was seeing crashes in inode_tree_del which I can not
reproduce after this patch.

The other way to fix this would be to tie inode lifetime in the rbtree
with inode while not in freeing state. I had a look at this but it is
not so trivial at this point. At least this patch gets things working again.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

03e860bd

ocfs2/dlm: Wait on lockres instead of erroring cancel requests · c795b33b

由 Goldwyn Rodrigues 提交于 8月 20, 2009

In case a downconvert is queued, and a flock receives a signal,
BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered
because a lock cancel triggers a dlmunlock while an AST is
scheduled.

To avoid this, allow a LKM_CANCEL to pass through, and let it
wait on __dlm_wait_on_lockres().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
Acked-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

c795b33b

ocfs2: Add missing lock name · a8b88d3d

由 Jan Kara 提交于 8月 20, 2009

There is missing name for NFSSync cluster lock. This makes lockdep unhappy
because we end up passing NULL to lockdep when initializing lock key. Fix it.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a8b88d3d

19 8月, 2009 3 次提交

mm: revert "oom: move oom_adj value" · 0753ba01

由 KOSAKI Motohiro 提交于 8月 18, 2009

The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct.  It was a very good first step for sanitize OOM.

However Paul Menage reported the commit makes regression to his job
scheduler.  Current OOM logic can kill OOM_DISABLED process.

Why? His program has the code of similar to the following.

	...
	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
	...
	if (vfork() == 0) {
		set_oom_adj(0); /* Invoked child can be killed */
		execve("foo-bar-cmd");
	}
	....

vfork() parent and child are shared the same mm_struct.  then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.

Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.

Then, this patch revert commit 2ff05b2b and related commit.

Reverted commit list
---------------------
- commit 2ff05b2b (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 81236810 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b (mm: copy over oom_adj value at fork time)
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0753ba01

vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed · 89a4eb4b

由 Jeff Layton 提交于 8月 18, 2009

get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
to a signed value.  Fix it to use MAX_LFS_FILESIZE which casts properly
to a positive signed value.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89a4eb4b

nilfs2: fix oopses with doubly mounted snapshots · a9245860

由 Ryusuke Konishi 提交于 8月 19, 2009

will fix kernel oopses like the following:

 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
 # umount /test1
 # umount /test2

BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
1 lock held by umount.nilfs2/3886:
 #0:  (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c
irq event stamp: 1219
hardirqs last  enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119
hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119
softirqs last  enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad
softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a
Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
Call Trace:
 [<c1023549>] __might_sleep+0x107/0x10e
 [<c13603c0>] do_page_fault+0x246/0x397
 [<c136017a>] ? do_page_fault+0x0/0x397
 [<c135e753>] error_code+0x6b/0x70
 [<c136017a>] ? do_page_fault+0x0/0x397
 [<c104f805>] ? __lock_acquire+0x91/0x12fd
 [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
 [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
 [<c1050b2b>] lock_acquire+0xba/0xdd
 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<c135d4fe>] down_write+0x2a/0x46
 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<c104ea2c>] ? mark_held_locks+0x43/0x5b
 [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133
 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd
 [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2]
 [<c10b3352>] generic_shutdown_super+0x49/0xb8
 [<c10b33de>] kill_block_super+0x1d/0x31
 [<c10e6599>] ? vfs_quota_off+0x0/0x12
 [<c10b398f>] deactivate_super+0x57/0x6c
 [<c10c4bc3>] mntput_no_expire+0x8c/0xb4
 [<c10c5094>] sys_umount+0x27f/0x2a4
 [<c10c50c6>] sys_oldumount+0xd/0xf
 [<c10031a4>] sysenter_do_call+0x12/0x38
 ...

This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
remaining sget() use").

In the patch, a new "put resource" function, nilfs_put_sbinfo()
was introduced to delay freeing nilfs_sb_info struct.

But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
function to check the reference count, and it caused the nilfs_sb_info
was freed when user mounted a snapshot twice.

This bug also suggests there was unseen memory leak in usual mount
/umount operations for nilfs.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

a9245860

18 8月, 2009 8 次提交

nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint() · 1154ecbd

由 Zhang Qiang 提交于 8月 18, 2009

'ns_cno' of structure 'the_nilfs' must be protected from segment
writer, in other words, the caller of nilfs_get_checkpoint should hold
read lock for nilfs->ns_segctor_sem.  This patch adds the lock/unlock
operations in nilfs_attach_checkpoint() when calling
nilfs_cpfile_get_checkpoint().
Signed-off-by: NZhang Qiang <zhangqiang.buaa@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

1154ecbd

9p: remove unnecessary v9fses->options which duplicates the mount string · 4b53e4b5

由 Abhishek Kulkarni 提交于 8月 17, 2009

The mount options string is saved in sb->s_options. This patch removes
the redundant duplicating of the mount options. Also, since we are not
displaying anything special in show options, we replace v9fs_show_options
with generic_show_options for now.
Signed-off-by: NAbhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

4b53e4b5

9p: Add missing cast for the error return value in v9fs_get_inode · 48559b4c

由 Abhishek Kulkarni 提交于 8月 17, 2009

Cast the error return value (ENOMEM) in v9fs_get_inode() to its
correct type using ERR_PTR.
Signed-off-by: NAbhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

48559b4c

ocfs2: Don't oops in ocfs2_kill_sb on a failed mount · 5fd13189

由 Jan Kara 提交于 7月 30, 2009

If we fail to mount the filesystem, we have to be careful not to dereference
uninitialized structures in ocfs2_kill_sb.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

5fd13189

9p: Remove redundant inode uid/gid assignment · 4d3297ca

由 Abhishek Kulkarni 提交于 7月 19, 2009

Remove a redundant update of inode's i_uid and i_gid
after v9fs_get_inode() since the latter already sets up
a new inode and sets the proper uid and gid values.
Signed-off-by: NAbhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

4d3297ca

9p: Fix possible regressions when ->get_sb fails. · 1b5ab3e8

由 Abhishek Kulkarni 提交于 7月 19, 2009

->get_sb can fail causing some badness. this patch fixes
   * clear sb->fs_s_info in kill_sb.
   * deactivate_locked_super() calls kill_sb (v9fs_kill_super) which closes the
     destroys the client, clunks all its fids and closes the v9fs session.
     Attempting to do it twice will cause an oops.
Signed-off-by: NAbhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

1b5ab3e8

9p: Fix v9fs show_options · 4f403832

由 Abhishek Kulkarni 提交于 7月 19, 2009

Add the delimiter ',' before the options when they are passed
and check if no option parameters are passed to prevent displaying
NULL in /proc/mounts.
Signed-off-by: NAbhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

4f403832

9p: Fix possible memleak in v9fs_inode_from fid. · 02bc3567

由 Abhishek Kulkarni 提交于 7月 19, 2009

Add missing p9stat_free in v9fs_inode_from_fid to avoid
any possible leaks.
Signed-off-by: NAbhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

02bc3567

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功