提交 · 353d5c30c666580347515da609dd74a2b8e9b828 · openeuler / raspberrypi-kernel

25 8月, 2009 1 次提交

mm: fix hugetlb bug due to user_shm_unlock call · 353d5c30

由 Hugh Dickins 提交于 8月 24, 2009

2.6.30's commit 8a0bdec1 removed
user_shm_lock() calls in hugetlb_file_setup() but left the
user_shm_unlock call in shm_destroy().

In detail:
Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
is not called in hugetlb_file_setup(). However, user_shm_unlock() is
called in any case in shm_destroy() and in the following
atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
up->__count gets zero, also cleanup_user_struct() is scheduled.

Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
However, the ref counter up->__count gets unexpectedly non-positive and
the corresponding structs are freed even though there are live
references to them, resulting in a kernel oops after a lots of
shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.

Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
time of shm_destroy() may give a different answer from at the time
of hugetlb_file_setup().  And fixed newseg()'s no_id error path,
which has missed user_shm_unlock() ever since it came in 2.6.9.
Reported-by: NStefan Huber <shuber2@gmail.com>
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: NStefan Huber <shuber2@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

353d5c30

24 8月, 2009 1 次提交

kernel_read: redefine offset type · 6777d773

由 Mimi Zohar 提交于 8月 21, 2009

vfs_read() offset is defined as loff_t, but kernel_read()
offset is only defined as unsigned long. Redefine
kernel_read() offset as loff_t.

Cc: stable@kernel.org
Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6777d773

22 8月, 2009 1 次提交

Re-introduce page mapping check in mark_buffer_dirty() · 8e9d78ed

由 Linus Torvalds 提交于 8月 21, 2009

In commit a8e7d49a ("Fix race in
create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
for a NULL page mapping unintentionally when some of the code inside
__set_page_dirty() was moved to the callers.

That removal generally didn't matter, since a filesystem would serialize
truncation (which clears the page mapping) against writing (which marks
the buffer dirty), so locking at a higher level (either per-page or an
inode at a time) should mean that the buffer page would be stable.  And
indeed, nothing bad seemed to happen.

Except it turns out that apparently reiserfs does something odd when
under load and writing out the journal, and we have a number of bugzilla
entries that look similar:

	http://bugzilla.kernel.org/show_bug.cgi?id=13556
	http://bugzilla.kernel.org/show_bug.cgi?id=13756
	http://bugzilla.kernel.org/show_bug.cgi?id=13876

and it looks like reiserfs depended on that check (the common theme
seems to be "data=journal", and a journal writeback during a truncate).

I suspect reiserfs should have some additional locking, but in the
meantime this should get us back to the pre-2.6.29 behavior.
Pattern-pointed-out-by: NRoland Kletzing <devzero@web.de>
Cc: stable@kernel.org (2.6.29 and 2.6.30)
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e9d78ed

21 8月, 2009 1 次提交

btrfs: fix inode rbtree corruption · 03e860bd

由 From: Nick Piggin 提交于 8月 21, 2009

Node may not be inserted over existing node. This causes inode tree
corruption and I was seeing crashes in inode_tree_del which I can not
reproduce after this patch.

The other way to fix this would be to tie inode lifetime in the rbtree
with inode while not in freeing state. I had a look at this but it is
not so trivial at this point. At least this patch gets things working again.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

03e860bd

19 8月, 2009 3 次提交

mm: revert "oom: move oom_adj value" · 0753ba01

由 KOSAKI Motohiro 提交于 8月 18, 2009

The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct.  It was a very good first step for sanitize OOM.

However Paul Menage reported the commit makes regression to his job
scheduler.  Current OOM logic can kill OOM_DISABLED process.

Why? His program has the code of similar to the following.

	...
	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
	...
	if (vfork() == 0) {
		set_oom_adj(0); /* Invoked child can be killed */
		execve("foo-bar-cmd");
	}
	....

vfork() parent and child are shared the same mm_struct.  then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.

Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.

Then, this patch revert commit 2ff05b2b and related commit.

Reverted commit list
---------------------
- commit 2ff05b2b (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 81236810 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b (mm: copy over oom_adj value at fork time)
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0753ba01

vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed · 89a4eb4b

由 Jeff Layton 提交于 8月 18, 2009

get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
to a signed value.  Fix it to use MAX_LFS_FILESIZE which casts properly
to a positive signed value.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89a4eb4b

nilfs2: fix oopses with doubly mounted snapshots · a9245860

由 Ryusuke Konishi 提交于 8月 19, 2009

will fix kernel oopses like the following:

 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
 # umount /test1
 # umount /test2

BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
1 lock held by umount.nilfs2/3886:
 #0:  (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c
irq event stamp: 1219
hardirqs last  enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119
hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119
softirqs last  enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad
softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a
Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
Call Trace:
 [<c1023549>] __might_sleep+0x107/0x10e
 [<c13603c0>] do_page_fault+0x246/0x397
 [<c136017a>] ? do_page_fault+0x0/0x397
 [<c135e753>] error_code+0x6b/0x70
 [<c136017a>] ? do_page_fault+0x0/0x397
 [<c104f805>] ? __lock_acquire+0x91/0x12fd
 [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
 [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
 [<c1050b2b>] lock_acquire+0xba/0xdd
 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<c135d4fe>] down_write+0x2a/0x46
 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<c104ea2c>] ? mark_held_locks+0x43/0x5b
 [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133
 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd
 [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2]
 [<c10b3352>] generic_shutdown_super+0x49/0xb8
 [<c10b33de>] kill_block_super+0x1d/0x31
 [<c10e6599>] ? vfs_quota_off+0x0/0x12
 [<c10b398f>] deactivate_super+0x57/0x6c
 [<c10c4bc3>] mntput_no_expire+0x8c/0xb4
 [<c10c5094>] sys_umount+0x27f/0x2a4
 [<c10c50c6>] sys_oldumount+0xd/0xf
 [<c10031a4>] sysenter_do_call+0x12/0x38
 ...

This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
remaining sget() use").

In the patch, a new "put resource" function, nilfs_put_sbinfo()
was introduced to delay freeing nilfs_sb_info struct.

But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
function to check the reference count, and it caused the nilfs_sb_info
was freed when user mounted a snapshot twice.

This bug also suggests there was unseen memory leak in usual mount
/umount operations for nilfs.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

a9245860

18 8月, 2009 4 次提交

nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint() · 1154ecbd

由 Zhang Qiang 提交于 8月 18, 2009

'ns_cno' of structure 'the_nilfs' must be protected from segment
writer, in other words, the caller of nilfs_get_checkpoint should hold
read lock for nilfs->ns_segctor_sem.  This patch adds the lock/unlock
operations in nilfs_attach_checkpoint() when calling
nilfs_cpfile_get_checkpoint().
Signed-off-by: NZhang Qiang <zhangqiang.buaa@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

1154ecbd

inotify: start watch descriptor count at 1 · 08e53fcb

由 Eric Paris 提交于 8月 16, 2009

The inotify_add_watch man page specifies that inotify_add_watch() will
return a non-negative integer.  However, historically the inotify
watches started at 1, not at 0.

Turns out that the inotifywait program provided by the inotify-tools
package doesn't properly handle a 0 watch descriptor.  In 7e790dd5 we
changed from starting at 1 to starting at 0.  This patch starts at 1,
just like in previous kernels, but also just like in previous kernels
it's possible for it to wrap back to 0.  This preserves the kernel
functionality exactly like it was before the patch (neither method broke
the spec)
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08e53fcb

inotify: tail drop inotify q_overflow events · cd94c8bb

由 Eric Paris 提交于 8月 16, 2009

In f44aebcc the tail drop logic of events with no file backing
(q_overflow and in_ignored) was reversed so IN_IGNORED events would
never be tail dropped.  This now means that Q_OVERFLOW events are NOT
tail dropped.  The fix is to not tail drop IN_IGNORED, but to tail drop
Q_OVERFLOW.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cd94c8bb

notify: unused event private race · eef3a116

由 Eric Paris 提交于 8月 16, 2009

inotify decides if private data it passed to get added to an event was
used by checking list_empty().  But it's possible that the event may
have been dequeued and the private event removed so it would look empty.

The fix is to use the return code from fsnotify_add_notify_event rather
than looking at the list.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eef3a116

17 8月, 2009 1 次提交

xfs: fix locking in xfs_iget_cache_hit · bc990f5c

由 Christoph Hellwig 提交于 8月 16, 2009

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects
   modifications to it
 - we call inode_init_always which can sleep with pag_ici_lock
   held (this is oss.sgi.com BZ #819)
 - we acquire and drop i_flags_lock a lot and thus provide no
   consistency between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in
the function.  The new version acquires i_flags_lock early and
only drops it once we need to call into inode_init_always or before
calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the
reclaim tag.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

bc990f5c

16 8月, 2009 1 次提交

poll/select: initialize triggered field of struct poll_wqueues · b2add73d

由 Guillaume Knispel 提交于 8月 15, 2009

The triggered field of struct poll_wqueues introduced in commit
5f820f64 ("poll: allow f_op->poll to
sleep").

It was first set to 1 in pollwake() (now __pollwake() ), tested and
later set to 0 in poll_schedule_timeout(), but not initialized before.

As a result when the process needs to sleep, triggered was likely to be
non-zero even if pollwake() is not called before the first
poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be
called and an extra loop calling all ->poll() would be done.

This patch initialize triggered to 0 in poll_initwait() so the ->poll()
are not called twice before the process goes to sleep when it needs to.
Signed-off-by: NGuillaume Knispel <gknispel@proformatique.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2add73d

14 8月, 2009 1 次提交

GFS2: Fix permissions on "recover" file · d7e623da

由 Steven Whitehouse 提交于 8月 11, 2009

Although this file is only ever written and not read by
userspace, it seems that the utils are opening this
file O_RDWR, so we need to allow that.

Also fixes the whitespace which seemed to be broken.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>

d7e623da

12 8月, 2009 13 次提交

NFS: Fix an O_DIRECT Oops... · 1ae88b2e

由 Trond Myklebust 提交于 8月 12, 2009

We can't call nfs_readdata_release()/nfs_writedata_release() without
first initialising and referencing args.context. Doing so inside
nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment()
causes an Oops.

We should rather be calling nfs_readdata_free()/nfs_writedata_free() in
those cases.

Looking at the O_DIRECT code, the "struct nfs_direct_req" is already
referencing the nfs_open_context for us. Since the readdata and writedata
structures carry a reference to that, we can simplify things by getting rid
of the extra nfs_open_context references, so that we can replace all
instances of nfs_readdata_release()/nfs_writedata_release().
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ae88b2e

xfs: fix spin_is_locked assert on uni-processor builds · a8914f3a

由 Christoph Hellwig 提交于 8月 10, 2009

Without SMP or preemption spin_is_locked always returns false,
so we can't do an assert with it.  Instead use assert_spin_locked,
which does the right thing on all builds.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reported-by: NJohannes Engel <jcnengel@googlemail.com>
Tested-by: NJohannes Engel <jcnengel@googlemail.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

a8914f3a

xfs: check for dinode realtime flag corruption · b89d4208

由 Christoph Hellwig 提交于 8月 10, 2009

Ramon tested XFS with a modified version of fsfuzzer and hit a NULL
pointer dereference in __xfs_get_blocks due to the RT device target
pointer being NULL.

To fix this reject inode with the realtime bit set on a a filesystem
without an RT subvolume during inode read.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Reported-by: NRamon de Carvalho Valle <ramon@risesecurity.org>
Tested-by: NRamon de Carvalho Valle <ramon@risesecurity.org>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

b89d4208

use XFS_CORRUPTION_ERROR in xfs_btree_check_sblock · e0c222c4

由 Eric Sandeen 提交于 7月 20, 2009

In Red Hat Bug 512552
 - Can't write to XFS mount during raid5 resync

a user ran into corruption while resyncing a raid, and we failed
a consistency test, but didn't get much more info; it'd be nice
to call XFS_CORRUPTION_ERROR here so we can see the buffer
contents.
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

e0c222c4

xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get · ddd3a14e

由 Christoph Hellwig 提交于 7月 18, 2009

xfs_attr_rmtval_get is always called with i_lock held, but i_lock is taken
in reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

ddd3a14e

xfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap · 7b02ecb3

由 Christoph Hellwig 提交于 7月 18, 2009

xfs_readlink_bmap is called with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

7b02ecb3

xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set · 10746e47

由 Christoph Hellwig 提交于 7月 18, 2009

xfs_attr_rmtval_set is always called with i_lock held, and i_lock is taken
in reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

10746e47

xfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory · 36fae17a

由 Christoph Hellwig 提交于 7月 18, 2009

xfs_buf_associate_memory is used for setting up the spare buffer for the
log wrap case in xlog_sync which can happen under i_lock when called from
xfs_fsync. The i_lock mutex is taken in reclaim context so all allocations
under it must avoid recursions into the filesystem. There are a couple
more uses of xfs_buf_associate_memory in the log recovery code that are
also affected by this, but I'd rather keep the code simple than passing on
a gfp_mask argument. Longer term we should just stop requiring the memoery
allocation in xlog_sync by some smaller rework of the buffer layer.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

36fae17a

xfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result · 3f52c2f0

由 Christoph Hellwig 提交于 7月 18, 2009

xfs_dir_cilookup_result is always called with i_lock held, but i_lock is taken
in reclaim context so all allocations under it must avoid recursions into the
filesystem.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

3f52c2f0

xfs: switch to NOFS allocation under i_lock in xfs_da_buf_make · 73195ed7

由 Christoph Hellwig 提交于 7月 18, 2009

i_lock is taken in the reclaim context so all allocations under it
must avoid recursions into the filesystem.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

73195ed7

xfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc · f41d7fb9

由 Christoph Hellwig 提交于 7月 18, 2009

xfs_da_state_alloc is always called with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into the
filesystem.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

f41d7fb9

xfs: switch to NOFS allocation under i_lock in xfs_getbmap · ca35dcd6

由 Christoph Hellwig 提交于 7月 18, 2009

xfs_getbmap allocates memory with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

ca35dcd6

xfs: avoid memory allocation under m_peraglock in growfs code · 0cc6eee1

由 Christoph Hellwig 提交于 7月 18, 2009

Allocate the memory for the larger m_perag array before taking the
per-AG lock as the per-AG lock can be taken under the i_lock which
can be taken from reclaim context.

Reported by the new reclaim context tracing in lockdep.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

0cc6eee1

11 8月, 2009 1 次提交

ocfs2: Fix possible deadlock when extending quota file · b409d7a0

由 Jan Kara 提交于 8月 06, 2009

In OCFS2, allocator locks rank above transaction start. Thus we
cannot extend quota file from inside a transaction less we could
deadlock.

We solve the problem by starting transaction not already in
ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and
ocfs2_global_read_dquot() and we allocate blocks to quota files before starting
the transaction.  In case we crash, quota files will just have a few blocks
more but that's no problem since we just use them next time we extend the
quota file.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b409d7a0

10 8月, 2009 3 次提交

mm_for_maps: take ->cred_guard_mutex to fix the race with exec · 704b836c

由 Oleg Nesterov 提交于 7月 10, 2009

The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.

Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.

Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

704b836c

mm_for_maps: shift down_read(mmap_sem) to the caller · 00f89d21

由 Oleg Nesterov 提交于 7月 10, 2009

mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

00f89d21

mm_for_maps: simplify, use ptrace_may_access() · 13f0feaf

由 Oleg Nesterov 提交于 6月 23, 2009

It would be nice to kill __ptrace_may_access(). It requires task_lock(),
but this lock is only needed to read mm->flags in the middle.

Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
the code a little bit.

Also, we do not need to take ->mmap_sem in advance. In fact I think
mm_for_maps() should not play with ->mmap_sem at all, the caller should
take this lock.

With or without this patch, without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

13f0feaf

08 8月, 2009 9 次提交

ocfs2: keep index within status_map[] · 8a57a9dd

由 Roel Kluin 提交于 8月 06, 2009

Do not exceed array status_map[]
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8a57a9dd

ocfs2: Initialize the cluster we're writing to in a non-sparse extend · e7432675

由 Sunil Mushran 提交于 8月 06, 2009

In a non-sparse extend, we correctly allocate (and zero) the clusters between
the old_i_size and pos, but we don't zero the portions of the cluster we're
writing to outside of pos<->len.

It handles clustersize > pagesize and blocksize < pagesize.

[Cleaned up by Joel Becker.]
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

e7432675

Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY · ceab36ed

由 Yan Zheng 提交于 8月 07, 2009

invalidate_inode_pages2_range may return -EBUSY occasionally
which results Oops. This patch fixes the issue by moving
invalidate_inode_pages2_range into a loop and keeping calling
it until the return value is not -EBUSY.

The EBUSY return is temporary, and can happen when the btrfs release page
function is unable to release a page because the EXTENT_LOCK
bit is set.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ceab36ed

Btrfs: correct error-handling zlib error handling · 60f2e8f8

由 Julia Lawall 提交于 8月 07, 2009

find_zlib_workspace returns an ERR_PTR value in an error case instead of NULL.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@

x = find_zlib_workspace(...)
... when != x = E
(
*  if (x == NULL || ...) S1 else S2
|
*  if (x == NULL && ...) S1 else S2
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

60f2e8f8

Btrfs: remove superfluous NULL pointer check in btrfs_rename() · 4baf8c92

由 Bartlomiej Zolnierkiewicz 提交于 8月 07, 2009

This takes care of the following entry from Dan's list:

fs/btrfs/inode.c +4788 btrfs_rename(36) warning: variable derefenced before check 'old_inode'
Reported-by: NDan Carpenter <error27@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Julia Lawall <julia@diku.dk>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4baf8c92

flat: fix uninitialized ptr with shared libs · 3440625d

由 Linus Torvalds 提交于 8月 06, 2009

The new credentials code broke load_flat_shared_library() as it now uses
an uninitialized cred pointer.
Reported-by: NBernd Schmidt <bernds_cb1@t-online.de>
Tested-by: NBernd Schmidt <bernds_cb1@t-online.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3440625d

vfs: mnt_want_write_file(): fix special file handling · 2d8dd38a

由 OGAWA Hirofumi 提交于 8月 06, 2009

I suspect that mnt_want_write_file() may have wrong assumption.  I think
mnt_want_write_file() is assuming it increments ->mnt_writers if
(file->f_mode & FMODE_WRITE).  But, if it's special_file(), it is false?
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: NDave Hansen <dave@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d8dd38a

compat_ioctl: hook up compat handler for FIEMAP ioctl · 69130c7c

由 Eric Sandeen 提交于 8月 06, 2009

The FIEMAP_IOC_FIEMAP mapping ioctl was missing a 32-bit compat handler,
which means that 32-bit suerspace on 64-bit kernels cannot use this ioctl
command.

The structure is nicely aligned, padded, and sized, so it is just this
simple.

Tested w/ 32-bit ioctl tester (from Josef) on a 64-bit kernel on ext4.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Mark Lord <lkml@rtr.ca>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Josef Bacik <josef@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69130c7c

xfs: fix freeing of inodes not yet added to the inode cache · b36ec042

由 Christoph Hellwig 提交于 8月 07, 2009

When freeing an inode that lost race getting added to the inode cache we
must not call into ->destroy_inode, because that would delete the inode
that won the race from the inode cache radix tree.

This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
and uses that plus __destroy_inode to make sure we really only free
the memory allocted for the inode that lost the race, and not mess with
the inode cache state.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reported-by: NAlex Samad <alex@samad.com.au>
Reported-by: NAndrew Randrianasulu <randrik@mail.ru>
Reported-by: NStephane <sharnois@max-t.com>
Reported-by: NTommy <tommy@news-service.com>
Reported-by: NMiah Gregory <mace@darksilence.net>
Reported-by: NGabriel Barazer <gabriel@oxeva.fr>
Reported-by: NLeandro Lucarella <llucax@gmail.com>
Reported-by: NDaniel Burr <dburr@fami.com.au>
Reported-by: NNickolay <newmail@spaces.ru>
Reported-by: NMichael Guntsche <mike@it-loops.com>
Reported-by: NDan Carley <dan.carley+linuxkern-bugs@gmail.com>
Reported-by: NMichael Ole Olsen <gnu@gmx.net>
Reported-by: NMichael Weissenbacher <mw@dermichi.com>
Reported-by: NMartin Spott <Martin.Spott@mgras.net>
Reported-by: NChristian Kujau <lists@nerdbynature.de>
Tested-by: NMichael Guntsche <mike@it-loops.com>
Tested-by: NDan Carley <dan.carley+linuxkern-bugs@gmail.com>
Tested-by: NChristian Kujau <lists@nerdbynature.de>

b36ec042