提交 · bddaafa11a549fff311bcf2e04bbfb5139812cb7 · OpenHarmony / kernel_linux

29 3月, 2009 4 次提交

xfs: pagecache usage optimization · bddaafa1

由 Hisashi Hifumi 提交于 3月 29, 2009

Hi.

I introduced "is_partially_uptodate" aops for XFS.

A page can have multiple buffers and even if a page is not uptodate,
some buffers can be uptodate on pagesize != blocksize environment.

This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.

"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read
after random write workloads can be optimized and we can get performance
improvement.

I did a performance test using the sysbench.

#sysbench --num-threads=4 --max-requests=100000 --test=fileio --file-num=1 \
--file-block-size=8K --file-total-size=1G --file-test-mode=rndrw \
--file-fsync-freq=0 --file-rw-ratio=0.5 run

-2.6.29-rc6
Test execution summary:
    total time:                          123.8645s
    total number of events:              100000
    total time taken by event execution: 442.4994
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0044s
         max:                            0.3387s
         approx.  95 percentile:         0.0118s

-2.6.29-rc6-patched
Test execution summary:
    total time:                          108.0757s
    total number of events:              100000
    total time taken by event execution: 417.7505
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0042s
         max:                            0.3217s
         approx.  95 percentile:         0.0118s

arch: ia64
pagesize: 16k
blocksize: 4k
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>

bddaafa1

xfs: remove m_litino · 6447c362

由 Christoph Hellwig 提交于 3月 29, 2009

With the upcoming v3 inodes the inode data/attr area size needs to be
calculated for each specific inode, so we can't cache it in the superblock
anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>

6447c362

xfs: kill ino64 mount option · a19d9f88

由 Christoph Hellwig 提交于 3月 29, 2009

The ino64 mount option adds a fixed offset to 32bit inode numbers
to bring them into the 64bit range.  There's no need for this kind
of debug tool given that it's easy to produce real 64bit inode numbers
for testing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>

a19d9f88

xfs: kill mutex_t typedef · a0b0b8a5

由 Christoph Hellwig 提交于 3月 29, 2009

People continue to complain about this for weird reasons, but there's
really no point in keeping this typedef for a couple of users anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>

a0b0b8a5

23 3月, 2009 3 次提交

Update my email address · f762dd68

由 Gertjan van Wingerde 提交于 3月 21, 2009

Update all previous incarnations of my email address to the correct one.
Signed-off-by: NGertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f762dd68

eCryptfs: NULL crypt_stat dereference during lookup · 2aac0cf8

由 Tyler Hicks 提交于 3月 20, 2009

If ecryptfs_encrypted_view or ecryptfs_xattr_metadata were being
specified as mount options, a NULL pointer dereference of crypt_stat
was possible during lookup.

This patch moves the crypt_stat assignment into
ecryptfs_lookup_and_interpose_lower(), ensuring that crypt_stat
will not be NULL before we attempt to dereference it.

Thanks to Dan Carpenter and his static analysis tool, smatch, for
finding this bug.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: NDustin Kirkland <kirkland@canonical.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2aac0cf8

eCryptfs: Allocate a variable number of pages for file headers · 8faece5f

由 Tyler Hicks 提交于 3月 20, 2009

When allocating the memory used to store the eCryptfs header contents, a
single, zeroed page was being allocated with get_zeroed_page().
However, the size of an eCryptfs header is either PAGE_CACHE_SIZE or
ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE (8192), whichever is larger, and is
stored in the file's private_data->crypt_stat->num_header_bytes_at_front
field.

ecryptfs_write_metadata_to_contents() was using
num_header_bytes_at_front to decide how many bytes should be written to
the lower filesystem for the file header.  Unfortunately, at least 8K
was being written from the page, despite the chance of the single,
zeroed page being smaller than 8K.  This resulted in random areas of
kernel memory being written between the 0x1000 and 0x1FFF bytes offsets
in the eCryptfs file headers if PAGE_SIZE was 4K.

This patch allocates a variable number of pages, calculated with
num_header_bytes_at_front, and passes the number of allocated pages
along to ecryptfs_write_metadata_to_contents().

Thanks to Florian Streibelt for reporting the data leak and working with
me to find the problem.  2.6.28 is the only kernel release with this
vulnerability.  Corresponds to CVE-2009-0787
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: NDustin Kirkland <kirkland@canonical.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NEugene Teo <eugeneteo@kernel.sg>
Cc: Greg KH <greg@kroah.com>
Cc: dann frazier <dannf@dannf.org>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Florian Streibelt <florian@f-streibelt.de>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8faece5f

20 3月, 2009 3 次提交

aio: lookup_ioctx can return the wrong value when looking up a bogus context · 65c24491

由 Jeff Moyer 提交于 3月 18, 2009

The libaio test harness turned up a problem whereby lookup_ioctx on a
bogus io context was returning the 1 valid io context from the list
(harness/cases/3.p).

Because of that, an extra put_iocontext was done, and when the process
exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio
(since we expect a users count of 1 and instead get 0).

The problem was introduced by "aio: make the lookup_ioctx() lockless"
(commit abf137dd).

Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not
return with a NULL tpos at the end of the loop, even if the entry was
not found.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NZach Brown <zach.brown@oracle.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65c24491

eventfd: remove fput() call from possible IRQ context · 87c3a86e

由 Davide Libenzi 提交于 3月 18, 2009

Remove a source of fput() call from inside IRQ context.  Myself, like Eric,
wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was
able to, with the attached test program.  Independently from this, the bug is
conceptually there, so we might be better off fixing it.  This patch adds an
optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd.
Playing with ->f_count directly is not pretty in general, but the alternative
here would be to add a brand new delayed fput() infrastructure, that I'm not
sure is worth it.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87c3a86e

Fix race in create_empty_buffers() vs __set_page_dirty_buffers() · a8e7d49a

由 Linus Torvalds 提交于 3月 19, 2009

Nick Piggin noticed this (very unlikely) race between setting a page
dirty and creating the buffers for it - we need to hold the mapping
private_lock until we've set the page dirty bit in order to make sure
that create_empty_buffers() might not build up a set of buffers without
the dirty bits set when the page is dirty.

I doubt anybody has ever hit this race (and it didn't solve the issue
Nick was looking at), but as Nick says: "Still, it does appear to solve
a real race, which we should close."
Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8e7d49a

18 3月, 2009 2 次提交

NFSD: provide encode routine for OP_OPENATTR · 84f09f46

由 Benny Halevy 提交于 3月 04, 2009

Although this operation is unsupported by our implementation
we still need to provide an encode routine for it to
merely encode its (error) status back in the compound reply.

Thanks for Bill Baker at sun.com for testing with the Sun
OpenSolaris' client, finding, and reporting this bug at
Connectathon 2009.

This bug was introduced in 2.6.27
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

84f09f46

Avoid 64-bit "switch()" statements on 32-bit architectures · ee568b25

由 Linus Torvalds 提交于 3月 17, 2009

Commit ee6f779b ("filp->f_pos not
correctly updated in proc_task_readdir") changed the proc code to use
filp->f_pos directly, rather than through a temporary variable.  In the
process, that caused the operations to be done on the full 64 bits, even
though the offset is never that big.

That's all fine and dandy per se, but for some unfathomable reason gcc
generates absolutely horrid code when using 64-bit values in switch()
statements.  To the point of actually calling out to gcc helper
functions like __cmpdi2 rather than just doing the trivial comparisons
directly the way gcc does for normal compares.  At which point we get
link failures, because we really don't want to support that kind of
crazy code.

Fix this by just casting the f_pos value to "unsigned long", which
is plenty big enough for /proc, and avoids the gcc code generation issue.
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Le <r0bertz@gentoo.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee568b25

17 3月, 2009 1 次提交

ext4: fix bb_prealloc_list corruption due to wrong group locking · d33a1976

由 Eric Sandeen 提交于 3月 16, 2009

This is for Red Hat bug 490026: EXT4 panic, list corruption in
ext4_mb_new_inode_pa

ext4_lock_group(sb, group) is supposed to protect this list for
each group, and a common code flow to remove an album is like
this:

    ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
    ext4_lock_group(sb, grp);
    list_del(&pa->pa_group_list);
    ext4_unlock_group(sb, grp);

so it's critical that we get the right group number back for
this prealloc context, to lock the right group (the one 
associated with this pa) and prevent concurrent list manipulation.

however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a 
comment, "-1 is to protect from crossing allocation group".

This makes sense for the group_pa, where pa_pstart is advanced
by the length which has been used (in ext4_mb_release_context()),
and when the entire length has been used, pa_pstart has been
advanced to the first block of the next group.

However, for inode_pa, pa_pstart is never advanced; it's just
set once to the first block in the group and not moved after
that.  So in this case, if we subtract one in ext4_mb_put_pa(),
we are actually locking the *previous* group, and opening the
race with the other threads which do not subtract off the extra
block.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d33a1976

16 3月, 2009 8 次提交

filp->f_pos not correctly updated in proc_task_readdir · ee6f779b

由 Zhang Le 提交于 3月 16, 2009

filp->f_pos only get updated at the end of the function. Thus d_off of those
dirents who are in the middle will be 0, and this will cause a problem in
glibc's readdir implementation, specifically endless loop. Because when overflow
occurs, f_pos will be set to next dirent to read, however it will be 0, unless
the next one is the last one. So it will start over again and again.

There is a sample program in man 2 gendents. This is the output of the program
running on a multithread program's task dir before this patch is applied:

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          0  ..
    506443  directory    16          0  3807
    506444  directory    16          0  3809
    506445  directory    16          0  3812
    506446  directory    16          0  3861
    506447  directory    16          0  3862
    506448  directory    16          8  3863

This is the output after this patch is applied

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          2  ..
    506443  directory    16          3  3807
    506444  directory    16          4  3809
    506445  directory    16          5  3812
    506446  directory    16          6  3861
    506447  directory    16          7  3862
    506448  directory    16          8  3863
Signed-off-by: NZhang Le <r0bertz@gentoo.org>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee6f779b

D
xfs: factor out code to find the longest free extent in the AG · 6cc87645
由 Dave Chinner 提交于 3月 16, 2009
```
Signed-off-by: NDave Chinner <dgc@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
6cc87645

xfs: kill VN_BAD · cb4c8cc1

由 Christoph Hellwig 提交于 3月 16, 2009

Remove this rather pointless wrapper and use is_bad_inode directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>

cb4c8cc1

xfs: kill vn_atime_* helpers. · 8fab451e

由 Christoph Hellwig 提交于 3月 16, 2009

Two out of three are unused already, and the third is better done open-coded
with a comment describing what's going on here.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>

8fab451e

xfs: cleanup xlog_bread · 076e6acb

由 Christoph Hellwig 提交于 3月 16, 2009

Most callers of xlog_bread need to call xlog_align to get the actual offset.
Consolidate that call into the main xlog_bread and provide a _xlog_bread
for those few that don't want the actual offset.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>

076e6acb

xfs: cleanup xlog_recover_do_trans · ff0205e0

由 Christoph Hellwig 提交于 3月 16, 2009

Change the big if-elsif-else block handling the different item types
into a more natural switch, remove assignments in conditionals and
remove an out of place comment from centuries ago on IRIX.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>

ff0205e0

xfs: remove another leftover of the old inode log item format · dd0bbad8

由 Christoph Hellwig 提交于 3月 16, 2009

There's another little snipplet of code left from the handling of the old
inode log item format in xlog_recover_do_inode_trans.  Kill it as it
can't be reached anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>

dd0bbad8

xfs: cleanup log unmount handling · 21b699c8

由 Christoph Hellwig 提交于 3月 16, 2009

Kill the current xfs_log_unmount wrapper and opencode the two function
calls in the only caller.  Rename the current xfs_log_unmount_dealloc to
xfs_log_unmount as it undoes xfs_log_mount and the new name makes that
more clear.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>

21b699c8

15 3月, 2009 6 次提交

Fix xfs debug build breakage by pushing xfs_error.h after · da5309cd

由 Felix Blyakher 提交于 3月 12, 2009

xfs_mount.h, which it depends on.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

da5309cd

block: fix memory leak in bio_clone() · 059ea331

由 Li Zefan 提交于 3月 09, 2009

If bio_integrity_clone() fails, bio_clone() returns NULL without freeing
the newly allocated bio.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

059ea331

block: Add gfp_mask parameter to bio_integrity_clone() · 87092698

由 un'ichi Nomura 提交于 3月 09, 2009

Stricter gfp_mask might be required for clone allocation.
For example, request-based dm may clone bio in interrupt context
so it has to use GFP_ATOMIC.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

87092698

eCryptfs: don't encrypt file key with filename key · 84814d64

由 Tyler Hicks 提交于 3月 13, 2009

eCryptfs has file encryption keys (FEK), file encryption key encryption
keys (FEKEK), and filename encryption keys (FNEK).  The per-file FEK is
encrypted with one or more FEKEKs and stored in the header of the
encrypted file.  I noticed that the FEK is also being encrypted by the
FNEK.  This is a problem if a user wants to use a different FNEK than
their FEKEK, as their file contents will still be accessible with the
FNEK.

This is a minimalistic patch which prevents the FNEKs signatures from
being copied to the inode signatures list.  Ultimately, it keeps the FEK
from being encrypted with a FNEK.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Acked-by: NDustin Kirkland <kirkland@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84814d64

nommu: ramfs: don't leak pages when adding to page cache fails · 15e7b876

由 Johannes Weiner 提交于 3月 13, 2009

When a ramfs nommu mapping is expanded, contiguous pages are allocated
and added to the pagecache.  The caller's reference is then passed on
by moving whole pagevecs to the file lru list.

If the page cache adding fails, make sure that the error path also
moves the pagevec contents which might still contain up to PAGEVEC_SIZE
successfully added pages, of which we would leak references otherwise.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Enrik Berkhan <Enrik.Berkhan@ge.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

15e7b876

nommu: ramfs: pages allocated to an inode's pagecache may get wrongly discarded · 020fe22f

由 Enrik Berkhan 提交于 3月 13, 2009

The pages attached to a ramfs inode's pagecache by truncation from nothing
- as done by SYSV SHM for example - may get discarded under memory
pressure.

The problem is that the pages are not marked dirty.  Anything that creates
data in an MMU-based ramfs will cause the pages holding that data will
cause the set_page_dirty() aop to be called.

For the NOMMU-based mmap, set_page_dirty() may be called by write(), but
it won't be called by page-writing faults on writable mmaps, and it isn't
called by ramfs_nommu_expand_for_mapping() when a file is being truncated
from nothing to allocate a contiguous run.

The solution is to mark the pages dirty at the point of allocation by the
truncation code.
Signed-off-by: NEnrik Berkhan <Enrik.Berkhan@ge.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

020fe22f

14 3月, 2009 1 次提交

ext4: fix bogus BUG_ONs in in mballoc code · 8d03c7a0

由 Eric Sandeen 提交于 3月 14, 2009

Thiemo Nagel reported that:

# dd if=/dev/zero of=image.ext4 bs=1M count=2
# mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \
  -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4
# mount -o loop image.ext4 mnt/
# dd if=/dev/zero of=mnt/file

oopsed, with a BUG_ON in ext4_mb_normalize_request because
size == EXT4_BLOCKS_PER_GROUP

It appears to me (esp. after talking to Andreas) that the BUG_ON
is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should
be allowed, though larger sizes do indicate a problem.

Fix that an another (apparently rare) codepath with a similar check.
Reported-by: NThiemo Nagel <thiemo.nagel@ph.tum.de>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8d03c7a0

13 3月, 2009 9 次提交

ocfs2: Use xs->bucket to set xattr value outside · 712e53e4

由 Tao Ma 提交于 3月 12, 2009

A long time ago, xs->base is allocated a 4K size and all the contents
in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to
abstract xattr bucket and xs->base is initialized to the start of the
bu_bhs[0]. So xs->base + offset will overflow when the value root is
stored outside the first block.

Then why we can survive the xattr test by now? It is because we always
read the bucket contiguously now and kernel mm allocate continguous
memory for us. We are lucky, but we should fix it. So just get the
right value root as other callers do.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

712e53e4

ocfs2: Fix a bug found by sparse check. · 74e77eb3

由 Tao Ma 提交于 3月 12, 2009

We need to use le32_to_cpu to test rec->e_cpos in
ocfs2_dinode_insert_check.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

74e77eb3

ocfs2: tweak to get the maximum inline data size with xattr · d9ae49d6

由 Tiger Yang 提交于 3月 05, 2009

Replace max_inline_data with max_inline_data_with_xattr
to ensure it correct when xattr inlined.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

d9ae49d6

ocfs2: reserve xattr block for new directory with inline data · 6c9fd1dc

由 Tiger Yang 提交于 3月 06, 2009

If this is a new directory with inline data, we choose to
reserve the entire inline area for directory contents and
force an external xattr block.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6c9fd1dc

fs: new inode i_state corruption fix · 7ef0d737

由 Nick Piggin 提交于 3月 12, 2009

There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121.  There is a script included to
reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem.  I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode.  This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
my case it would look like this:

CPU0                            CPU1
unlock_new_inode()              __sync_single_inode()
 reg <- inode->i_state
 reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
 reg -> inode->i_state          reg -> reg | I_SYNC
                                reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.

After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running.  Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either.  I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.

Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Reported-by: NAdrian Hunter <ext-adrian.hunter@nokia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ef0d737

vfs: add missing unlock in sget() · a3cfbb53

由 Li Zefan 提交于 3月 12, 2009

In sget(), destroy_super(s) is called with s->s_umount held, which makes
lockdep unhappy.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3cfbb53

pipe_rdwr_fasync: fix the error handling to prevent the leak/crash · e5bc49ba

由 Oleg Nesterov 提交于 3月 12, 2009

If the second fasync_helper() fails, pipe_rdwr_fasync() returns the error
but leaves the file on ->fasync_readers.

This was always wrong, but since 233e70f4
"saner FASYNC handling on file close" we have the new problem.  Because in
this case setfl() doesn't set FASYNC bit, __fput() will not do
->fasync(0), and we leak fasync_struct with ->fa_file pointing to the
freed file.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5bc49ba

NFS: Fix the fix to Bugzilla #11061, when IPv6 isn't defined... · 9f4c899c

由 Trond Myklebust 提交于 3月 12, 2009

Stephen Rothwell reports:

Today's linux-next build (powerpc ppc64_defconfig) failed like this:

fs/built-in.o: In function `.nfs_get_client':
client.c:(.text+0x115010): undefined reference to `.__ipv6_addr_type'

Fix by moving the IPV6 specific parts of commit
d7371c41 ("Bug 11061, NFS mounts dropped")
into the '#ifdef IPV6..." section.

Also fix up a couple of formatting issues.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9f4c899c

ext4: Print the find_group_flex() warning only once · 2842c3b5

由 Theodore Ts'o 提交于 3月 12, 2009

This is a short-term warning, and even printk_ratelimit() can result
in too much noise in system logs. So only print it once as a warning.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2842c3b5

12 3月, 2009 2 次提交

Squashfs: Valid filesystems are flagged as bad by the corrupted fs patch · 363911d0

由 Phillip Lougher 提交于 3月 12, 2009

The corrupted filesystem patch added a check against zlib trying to
output too much data in the presence of data corruption. This check
triggered if zlib_inflate asked to be called again (Z_OK) with
avail_out == 0 and no more output buffers available. This check proves
to be rather dumb, as it incorrectly catches the case where zlib has
generated all the output, but there are still input bytes to be processed.

This patch does a number of things. It removes the original check and
replaces it with code to not move to the next output buffer if there
are no more output buffers available, relying on zlib to error if it
wants an extra output buffer in the case of data corruption. It
also replaces the Z_NO_FLUSH flag with the more correct Z_SYNC_FLUSH
flag, and makes the error messages more understandable to
non-technical users.
Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
Reported-by: NStefan Lippers-Hollmann <s.L-H@gmx.de>

363911d0

Fix _fat_bmap() locking · 3a95ea11

由 OGAWA Hirofumi 提交于 3月 12, 2009

On swapon() path, it has already i_mutex. So, this uses i_alloc_sem
instead of it.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: NLaurent GUERBY <laurent@guerby.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a95ea11

11 3月, 2009 1 次提交

proc: fix kflags to uflags copying in /proc/kpageflags · ad3bdefe

由 Wu Fengguang 提交于 3月 11, 2009

Fix kpf_copy_bit(src,dst) to be kpf_copy_bit(dst,src) to match the
actual call patterns, e.g. kpf_copy_bit(kflags, KPF_LOCKED, PG_locked).

This misplacement of src/dst only affected reporting of PG_writeback,
PG_reclaim and PG_buddy. For others kflags==uflags so not affected.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad3bdefe

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多