提交 · 81a052273998f94b098945c4c313e05246956eb2 · openanolis / cloud-kernel

26 3月, 2009 15 次提交

ext3: Use lowercase names of quota functions · 81a05227

由 Jan Kara 提交于 1月 26, 2009

Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: NJan Kara <jack@suse.cz>
CC: linux-ext4@vger.kernel.org

81a05227

ext2: Use lowercase names of quota functions · 6f90bee5

由 Jan Kara 提交于 1月 26, 2009

Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: NJan Kara <jack@suse.cz>
CC: linux-ext4@vger.kernel.org

6f90bee5

J
ramfs: Remove quota call · 31464955
由 Jan Kara 提交于 1月 26, 2009
```
Ramfs has no bussiness in quotas.
Signed-off-by: NJan Kara <jack@suse.cz>
```
31464955

vfs: Use lowercase names of quota functions · 9e3509e2

由 Jan Kara 提交于 1月 26, 2009

Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: NJan Kara <jack@suse.cz>
CC: Alexander Viro <viro@zeniv.linux.org.uk>

9e3509e2

quota: Remove dqbuf_t and other cleanups · d26ac1a8

由 Jan Kara 提交于 1月 26, 2009

Remove bogus typedef which is just a definition of char *.
Remove unnecessary type casts.
Substitute freedqbuf() with kfree.
Signed-off-by: NJan Kara <jack@suse.cz>

d26ac1a8

quota: Remove NODQUOT macro · dd6f3c6d

由 Jan Kara 提交于 1月 26, 2009

Remove this macro which is just a definition of NULL. Fix a few coding style
issues along the way.
Signed-off-by: NJan Kara <jack@suse.cz>

dd6f3c6d

quota: Make global quota locks cacheline aligned · c516610c

由 Jan Kara 提交于 1月 26, 2009

Andrew Morton has suggested that three global quota locks can end up in the
same cacheline which can result in bad cacheline ping-pong on SMP machines.
Make locks cacheline aligned so that we avoid this problem (thanks goes to
Andrew for the idea).
Signed-off-by: NJan Kara <jack@suse.cz>
CC: Andrew Morton <akpm@linux-foundation.org>

c516610c

quota: Move quota files into separate directory · 884d179d

由 Jan Kara 提交于 1月 26, 2009

Quota subsystem has more and more files. It's time to create a dir for it.
Signed-off-by: NJan Kara <jack@suse.cz>

884d179d

ext4: quota reservation for delayed allocation · 60e58e0f

由 Mingming Cao 提交于 1月 22, 2009

Uses quota reservation/claim/release to handle quota properly for delayed
allocation in the three steps: 1) quotas are reserved when data being copied
to cache when block allocation is defered 2) when new blocks are allocated.
reserved quotas are converted to the real allocated quota, 2) over-booked
quotas for metadata blocks are released back.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>

60e58e0f

reiserfs: Remove unnecessary quota functions · 643d00cc

由 Jan Kara 提交于 1月 12, 2009

reiserfs_dquot_initialize() and reiserfs_dquot_drop() is no longer
needed because of modified quota locking.
Signed-off-by: NJan Kara <jack@suse.cz>

643d00cc

ext4: Remove unnecessary quota functions · edf72453

由 Jan Kara 提交于 1月 12, 2009

ext4_dquot_initialize() and ext4_dquot_drop() is no longer
needed because of modified quota locking.
Signed-off-by: NJan Kara <jack@suse.cz>

edf72453

ext3: Remove unnecessary quota functions · a219ce37

由 Jan Kara 提交于 1月 12, 2009

ext3_dquot_initialize() and ext3_dquot_drop() is no longer
needed because of modified quota locking.
Signed-off-by: NJan Kara <jack@suse.cz>

a219ce37

quota: Move EXPORT_SYMBOL immediately next to the functions/varibles · 08d0350c

由 Mingming Cao 提交于 1月 14, 2009

According to checkpatch: EXPORT_SYMBOL(foo); should immediately follow its
 function/variable
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NJan Kara <jack@suse.cz>

08d0350c

quota: Add quota reservation claim and released operations · 740d9dcd

由 Mingming Cao 提交于 1月 13, 2009

Reserved quota will be claimed at the block allocation time. Over-booked
quota could be returned back with the release callback function.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NJan Kara <jack@suse.cz>

740d9dcd

quota: Add quota reservation support · f18df228

由 Mingming Cao 提交于 1月 13, 2009

Delayed allocation defers the block allocation at the dirty pages
flush-out time, doing quota charge/check at that time is too late.
But we can't charge the quota blocks until blocks are really allocated,
otherwise users could get overcharged after reboot from system crash.

This patch adds quota reservation for delayed allocation. Quota blocks
are reserved in memory, inode and quota won't gets dirtied until later
block allocation time.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NJan Kara <jack@suse.cz>

f18df228

23 3月, 2009 3 次提交

Update my email address · f762dd68

由 Gertjan van Wingerde 提交于 3月 21, 2009

Update all previous incarnations of my email address to the correct one.
Signed-off-by: NGertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f762dd68

eCryptfs: NULL crypt_stat dereference during lookup · 2aac0cf8

由 Tyler Hicks 提交于 3月 20, 2009

If ecryptfs_encrypted_view or ecryptfs_xattr_metadata were being
specified as mount options, a NULL pointer dereference of crypt_stat
was possible during lookup.

This patch moves the crypt_stat assignment into
ecryptfs_lookup_and_interpose_lower(), ensuring that crypt_stat
will not be NULL before we attempt to dereference it.

Thanks to Dan Carpenter and his static analysis tool, smatch, for
finding this bug.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: NDustin Kirkland <kirkland@canonical.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2aac0cf8

eCryptfs: Allocate a variable number of pages for file headers · 8faece5f

由 Tyler Hicks 提交于 3月 20, 2009

When allocating the memory used to store the eCryptfs header contents, a
single, zeroed page was being allocated with get_zeroed_page().
However, the size of an eCryptfs header is either PAGE_CACHE_SIZE or
ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE (8192), whichever is larger, and is
stored in the file's private_data->crypt_stat->num_header_bytes_at_front
field.

ecryptfs_write_metadata_to_contents() was using
num_header_bytes_at_front to decide how many bytes should be written to
the lower filesystem for the file header.  Unfortunately, at least 8K
was being written from the page, despite the chance of the single,
zeroed page being smaller than 8K.  This resulted in random areas of
kernel memory being written between the 0x1000 and 0x1FFF bytes offsets
in the eCryptfs file headers if PAGE_SIZE was 4K.

This patch allocates a variable number of pages, calculated with
num_header_bytes_at_front, and passes the number of allocated pages
along to ecryptfs_write_metadata_to_contents().

Thanks to Florian Streibelt for reporting the data leak and working with
me to find the problem.  2.6.28 is the only kernel release with this
vulnerability.  Corresponds to CVE-2009-0787
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: NDustin Kirkland <kirkland@canonical.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NEugene Teo <eugeneteo@kernel.sg>
Cc: Greg KH <greg@kroah.com>
Cc: dann frazier <dannf@dannf.org>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Florian Streibelt <florian@f-streibelt.de>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8faece5f

20 3月, 2009 3 次提交

aio: lookup_ioctx can return the wrong value when looking up a bogus context · 65c24491

由 Jeff Moyer 提交于 3月 18, 2009

The libaio test harness turned up a problem whereby lookup_ioctx on a
bogus io context was returning the 1 valid io context from the list
(harness/cases/3.p).

Because of that, an extra put_iocontext was done, and when the process
exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio
(since we expect a users count of 1 and instead get 0).

The problem was introduced by "aio: make the lookup_ioctx() lockless"
(commit abf137dd).

Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not
return with a NULL tpos at the end of the loop, even if the entry was
not found.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NZach Brown <zach.brown@oracle.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65c24491

eventfd: remove fput() call from possible IRQ context · 87c3a86e

由 Davide Libenzi 提交于 3月 18, 2009

Remove a source of fput() call from inside IRQ context.  Myself, like Eric,
wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was
able to, with the attached test program.  Independently from this, the bug is
conceptually there, so we might be better off fixing it.  This patch adds an
optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd.
Playing with ->f_count directly is not pretty in general, but the alternative
here would be to add a brand new delayed fput() infrastructure, that I'm not
sure is worth it.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87c3a86e

Fix race in create_empty_buffers() vs __set_page_dirty_buffers() · a8e7d49a

由 Linus Torvalds 提交于 3月 19, 2009

Nick Piggin noticed this (very unlikely) race between setting a page
dirty and creating the buffers for it - we need to hold the mapping
private_lock until we've set the page dirty bit in order to make sure
that create_empty_buffers() might not build up a set of buffers without
the dirty bits set when the page is dirty.

I doubt anybody has ever hit this race (and it didn't solve the issue
Nick was looking at), but as Nick says: "Still, it does appear to solve
a real race, which we should close."
Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8e7d49a

18 3月, 2009 2 次提交

NFSD: provide encode routine for OP_OPENATTR · 84f09f46

由 Benny Halevy 提交于 3月 04, 2009

Although this operation is unsupported by our implementation
we still need to provide an encode routine for it to
merely encode its (error) status back in the compound reply.

Thanks for Bill Baker at sun.com for testing with the Sun
OpenSolaris' client, finding, and reporting this bug at
Connectathon 2009.

This bug was introduced in 2.6.27
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

84f09f46

Avoid 64-bit "switch()" statements on 32-bit architectures · ee568b25

由 Linus Torvalds 提交于 3月 17, 2009

Commit ee6f779b ("filp->f_pos not
correctly updated in proc_task_readdir") changed the proc code to use
filp->f_pos directly, rather than through a temporary variable.  In the
process, that caused the operations to be done on the full 64 bits, even
though the offset is never that big.

That's all fine and dandy per se, but for some unfathomable reason gcc
generates absolutely horrid code when using 64-bit values in switch()
statements.  To the point of actually calling out to gcc helper
functions like __cmpdi2 rather than just doing the trivial comparisons
directly the way gcc does for normal compares.  At which point we get
link failures, because we really don't want to support that kind of
crazy code.

Fix this by just casting the f_pos value to "unsigned long", which
is plenty big enough for /proc, and avoids the gcc code generation issue.
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Le <r0bertz@gentoo.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee568b25

17 3月, 2009 1 次提交

ext4: fix bb_prealloc_list corruption due to wrong group locking · d33a1976

由 Eric Sandeen 提交于 3月 16, 2009

This is for Red Hat bug 490026: EXT4 panic, list corruption in
ext4_mb_new_inode_pa

ext4_lock_group(sb, group) is supposed to protect this list for
each group, and a common code flow to remove an album is like
this:

    ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
    ext4_lock_group(sb, grp);
    list_del(&pa->pa_group_list);
    ext4_unlock_group(sb, grp);

so it's critical that we get the right group number back for
this prealloc context, to lock the right group (the one 
associated with this pa) and prevent concurrent list manipulation.

however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a 
comment, "-1 is to protect from crossing allocation group".

This makes sense for the group_pa, where pa_pstart is advanced
by the length which has been used (in ext4_mb_release_context()),
and when the entire length has been used, pa_pstart has been
advanced to the first block of the next group.

However, for inode_pa, pa_pstart is never advanced; it's just
set once to the first block in the group and not moved after
that.  So in this case, if we subtract one in ext4_mb_put_pa(),
we are actually locking the *previous* group, and opening the
race with the other threads which do not subtract off the extra
block.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d33a1976

16 3月, 2009 1 次提交

filp->f_pos not correctly updated in proc_task_readdir · ee6f779b

由 Zhang Le 提交于 3月 16, 2009

filp->f_pos only get updated at the end of the function. Thus d_off of those
dirents who are in the middle will be 0, and this will cause a problem in
glibc's readdir implementation, specifically endless loop. Because when overflow
occurs, f_pos will be set to next dirent to read, however it will be 0, unless
the next one is the last one. So it will start over again and again.

There is a sample program in man 2 gendents. This is the output of the program
running on a multithread program's task dir before this patch is applied:

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          0  ..
    506443  directory    16          0  3807
    506444  directory    16          0  3809
    506445  directory    16          0  3812
    506446  directory    16          0  3861
    506447  directory    16          0  3862
    506448  directory    16          8  3863

This is the output after this patch is applied

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          2  ..
    506443  directory    16          3  3807
    506444  directory    16          4  3809
    506445  directory    16          5  3812
    506446  directory    16          6  3861
    506447  directory    16          7  3862
    506448  directory    16          8  3863
Signed-off-by: NZhang Le <r0bertz@gentoo.org>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee6f779b

15 3月, 2009 5 次提交

block: fix memory leak in bio_clone() · 059ea331

由 Li Zefan 提交于 3月 09, 2009

If bio_integrity_clone() fails, bio_clone() returns NULL without freeing
the newly allocated bio.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

059ea331

block: Add gfp_mask parameter to bio_integrity_clone() · 87092698

由 un'ichi Nomura 提交于 3月 09, 2009

Stricter gfp_mask might be required for clone allocation.
For example, request-based dm may clone bio in interrupt context
so it has to use GFP_ATOMIC.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

87092698

eCryptfs: don't encrypt file key with filename key · 84814d64

由 Tyler Hicks 提交于 3月 13, 2009

eCryptfs has file encryption keys (FEK), file encryption key encryption
keys (FEKEK), and filename encryption keys (FNEK).  The per-file FEK is
encrypted with one or more FEKEKs and stored in the header of the
encrypted file.  I noticed that the FEK is also being encrypted by the
FNEK.  This is a problem if a user wants to use a different FNEK than
their FEKEK, as their file contents will still be accessible with the
FNEK.

This is a minimalistic patch which prevents the FNEKs signatures from
being copied to the inode signatures list.  Ultimately, it keeps the FEK
from being encrypted with a FNEK.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Acked-by: NDustin Kirkland <kirkland@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84814d64

nommu: ramfs: don't leak pages when adding to page cache fails · 15e7b876

由 Johannes Weiner 提交于 3月 13, 2009

When a ramfs nommu mapping is expanded, contiguous pages are allocated
and added to the pagecache.  The caller's reference is then passed on
by moving whole pagevecs to the file lru list.

If the page cache adding fails, make sure that the error path also
moves the pagevec contents which might still contain up to PAGEVEC_SIZE
successfully added pages, of which we would leak references otherwise.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Enrik Berkhan <Enrik.Berkhan@ge.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

15e7b876

nommu: ramfs: pages allocated to an inode's pagecache may get wrongly discarded · 020fe22f

由 Enrik Berkhan 提交于 3月 13, 2009

The pages attached to a ramfs inode's pagecache by truncation from nothing
- as done by SYSV SHM for example - may get discarded under memory
pressure.

The problem is that the pages are not marked dirty.  Anything that creates
data in an MMU-based ramfs will cause the pages holding that data will
cause the set_page_dirty() aop to be called.

For the NOMMU-based mmap, set_page_dirty() may be called by write(), but
it won't be called by page-writing faults on writable mmaps, and it isn't
called by ramfs_nommu_expand_for_mapping() when a file is being truncated
from nothing to allocate a contiguous run.

The solution is to mark the pages dirty at the point of allocation by the
truncation code.
Signed-off-by: NEnrik Berkhan <Enrik.Berkhan@ge.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

020fe22f

14 3月, 2009 1 次提交

ext4: fix bogus BUG_ONs in in mballoc code · 8d03c7a0

由 Eric Sandeen 提交于 3月 14, 2009

Thiemo Nagel reported that:

# dd if=/dev/zero of=image.ext4 bs=1M count=2
# mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \
  -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4
# mount -o loop image.ext4 mnt/
# dd if=/dev/zero of=mnt/file

oopsed, with a BUG_ON in ext4_mb_normalize_request because
size == EXT4_BLOCKS_PER_GROUP

It appears to me (esp. after talking to Andreas) that the BUG_ON
is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should
be allowed, though larger sizes do indicate a problem.

Fix that an another (apparently rare) codepath with a similar check.
Reported-by: NThiemo Nagel <thiemo.nagel@ph.tum.de>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8d03c7a0

13 3月, 2009 9 次提交

ocfs2: Use xs->bucket to set xattr value outside · 712e53e4

由 Tao Ma 提交于 3月 12, 2009

A long time ago, xs->base is allocated a 4K size and all the contents
in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to
abstract xattr bucket and xs->base is initialized to the start of the
bu_bhs[0]. So xs->base + offset will overflow when the value root is
stored outside the first block.

Then why we can survive the xattr test by now? It is because we always
read the bucket contiguously now and kernel mm allocate continguous
memory for us. We are lucky, but we should fix it. So just get the
right value root as other callers do.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

712e53e4

ocfs2: Fix a bug found by sparse check. · 74e77eb3

由 Tao Ma 提交于 3月 12, 2009

We need to use le32_to_cpu to test rec->e_cpos in
ocfs2_dinode_insert_check.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

74e77eb3

ocfs2: tweak to get the maximum inline data size with xattr · d9ae49d6

由 Tiger Yang 提交于 3月 05, 2009

Replace max_inline_data with max_inline_data_with_xattr
to ensure it correct when xattr inlined.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

d9ae49d6

ocfs2: reserve xattr block for new directory with inline data · 6c9fd1dc

由 Tiger Yang 提交于 3月 06, 2009

If this is a new directory with inline data, we choose to
reserve the entire inline area for directory contents and
force an external xattr block.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6c9fd1dc

fs: new inode i_state corruption fix · 7ef0d737

由 Nick Piggin 提交于 3月 12, 2009

There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121.  There is a script included to
reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem.  I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode.  This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
my case it would look like this:

CPU0                            CPU1
unlock_new_inode()              __sync_single_inode()
 reg <- inode->i_state
 reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
 reg -> inode->i_state          reg -> reg | I_SYNC
                                reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.

After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running.  Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either.  I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.

Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Reported-by: NAdrian Hunter <ext-adrian.hunter@nokia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ef0d737

vfs: add missing unlock in sget() · a3cfbb53

由 Li Zefan 提交于 3月 12, 2009

In sget(), destroy_super(s) is called with s->s_umount held, which makes
lockdep unhappy.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3cfbb53

pipe_rdwr_fasync: fix the error handling to prevent the leak/crash · e5bc49ba

由 Oleg Nesterov 提交于 3月 12, 2009

If the second fasync_helper() fails, pipe_rdwr_fasync() returns the error
but leaves the file on ->fasync_readers.

This was always wrong, but since 233e70f4
"saner FASYNC handling on file close" we have the new problem.  Because in
this case setfl() doesn't set FASYNC bit, __fput() will not do
->fasync(0), and we leak fasync_struct with ->fa_file pointing to the
freed file.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5bc49ba

NFS: Fix the fix to Bugzilla #11061, when IPv6 isn't defined... · 9f4c899c

由 Trond Myklebust 提交于 3月 12, 2009

Stephen Rothwell reports:

Today's linux-next build (powerpc ppc64_defconfig) failed like this:

fs/built-in.o: In function `.nfs_get_client':
client.c:(.text+0x115010): undefined reference to `.__ipv6_addr_type'

Fix by moving the IPV6 specific parts of commit
d7371c41 ("Bug 11061, NFS mounts dropped")
into the '#ifdef IPV6..." section.

Also fix up a couple of formatting issues.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9f4c899c

ext4: Print the find_group_flex() warning only once · 2842c3b5

由 Theodore Ts'o 提交于 3月 12, 2009

This is a short-term warning, and even printk_ratelimit() can result
in too much noise in system logs. So only print it once as a warning.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2842c3b5

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功